dreamone5156
dreamone5156
2011-02-13 17:33

构造正则表达式来提取多个数据

已采纳

I need a regular expression to get the Event, Name, School, Final Swim Time, and Swim Threshold (The DIIA) from a Results page like the one at ( http://www.gliac.org/sports/mswimdive/2010-11/stats/Results_Wed_Finals.htm ). Note that the results are sepereated from the rest of the page by the "pre" html tag.

Each "line" looks like this:

1 Donahue, Maura            19 INDY                10:39.77   10:03.60 DIIA

Unfortunately, I'm not sure exactly how to do so. One of the problems (in my mind!) is that sometimes it displays the swimmers age (19) and other times it doesn't. In addition, sometimes results show their seed time (10:39.77) and other times it only has the final time (10:03.60).

I started the regex by trying to split up to the "," in the first name, but failed miserably.

I'm using simple_html to extract the contents of the HTML page.

My code looks like this (I'm using PHP):

  $results_url = "http://www.gliac.org/sports/mswimdive/2010-11/stats/Results_Wed_Finals.htm";
// Create a DOM object from a URL
$html = file_get_html($results_url);
if (!$html->find('pre')) {
    $parse_error = "Yes";
}
if (!isset($parse_error)) {    
        $regex = "/[0-9]+(?=[ \s]+)(?=[A-Za-z]+)/";
        $splits = preg_split($regex, $html, PREG_SPLIT_DELIM_CAPTURE);
        print_r($splits);    
}

If you can help out or point me in the right direction, that would be awesome! Is it even possible to run a regex against the results to extract this data?

Thank you!

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

2条回答

  • doumuyu0837 doumuyu0837 10年前

    I wont pretend to know what all those numbers mean, but here's something to help start you off with the first line of each person.

    preg_match_all('/(?P<position>[0-9-]+)\s+(?P<last>[a-z]+)\s*,\s*(?P<first>[a-z]+)\s+((?P<age>[0-9]{2})\s)?(?P<school>[a-z -]+[a-z])\s+(?P<seed>(NT|[0-9:.]+))\s+(?P<final>[0-9:\.]+)\s+(?P<division>[a-z]+)/is', $html, $matches);
    print_r($matches);
    

    The regex is very basic and seems to work right now, but when dealing with content you don't have control over, you may want to account for a lot more. For instance, right now that name matching wont work with names that have accented characters or punctuation characters like in the name O'Reilly.

    点赞 评论 复制链接分享
  • dongnaoben4456 dongnaoben4456 10年前

    Sounds like you could use either preg_match() or preg_match_all() (see links below)

    http://php.net/manual/en/function.preg-match-all.php

    http://php.net/manual/en/function.preg-match.php

    点赞 评论 复制链接分享