I need a regular expression to get the Event, Name, School, Final Swim Time, and Swim Threshold (The DIIA) from a Results page like the one at ( http://www.gliac.org/sports/mswimdive/2010-11/stats/Results_Wed_Finals.htm ). Note that the results are sepereated from the rest of the page by the "pre" html tag.
Each "line" looks like this:
1 Donahue, Maura 19 INDY 10:39.77 10:03.60 DIIA
Unfortunately, I'm not sure exactly how to do so. One of the problems (in my mind!) is that sometimes it displays the swimmers age (19
) and other times it doesn't. In addition, sometimes results show their seed time (10:39.77
) and other times it only has the final time (10:03.60
).
I started the regex by trying to split up to the ",
" in the first name, but failed miserably.
I'm using simple_html to extract the contents of the HTML page.
My code looks like this (I'm using PHP):
$results_url = "http://www.gliac.org/sports/mswimdive/2010-11/stats/Results_Wed_Finals.htm";
// Create a DOM object from a URL
$html = file_get_html($results_url);
if (!$html->find('pre')) {
$parse_error = "Yes";
}
if (!isset($parse_error)) {
$regex = "/[0-9]+(?=[ \s]+)(?=[A-Za-z]+)/";
$splits = preg_split($regex, $html, PREG_SPLIT_DELIM_CAPTURE);
print_r($splits);
}
If you can help out or point me in the right direction, that would be awesome! Is it even possible to run a regex against the results to extract this data?
Thank you!