dreamone5156 2011-02-13 17:33
浏览 74
已采纳

构造正则表达式来提取多个数据

I need a regular expression to get the Event, Name, School, Final Swim Time, and Swim Threshold (The DIIA) from a Results page like the one at ( http://www.gliac.org/sports/mswimdive/2010-11/stats/Results_Wed_Finals.htm ). Note that the results are sepereated from the rest of the page by the "pre" html tag.

Each "line" looks like this:

1 Donahue, Maura            19 INDY                10:39.77   10:03.60 DIIA

Unfortunately, I'm not sure exactly how to do so. One of the problems (in my mind!) is that sometimes it displays the swimmers age (19) and other times it doesn't. In addition, sometimes results show their seed time (10:39.77) and other times it only has the final time (10:03.60).

I started the regex by trying to split up to the "," in the first name, but failed miserably.

I'm using simple_html to extract the contents of the HTML page.

My code looks like this (I'm using PHP):

  $results_url = "http://www.gliac.org/sports/mswimdive/2010-11/stats/Results_Wed_Finals.htm";
// Create a DOM object from a URL
$html = file_get_html($results_url);
if (!$html->find('pre')) {
    $parse_error = "Yes";
}
if (!isset($parse_error)) {    
        $regex = "/[0-9]+(?=[ \s]+)(?=[A-Za-z]+)/";
        $splits = preg_split($regex, $html, PREG_SPLIT_DELIM_CAPTURE);
        print_r($splits);    
}

If you can help out or point me in the right direction, that would be awesome! Is it even possible to run a regex against the results to extract this data?

Thank you!

  • 写回答

2条回答 默认 最新

  • doumuyu0837 2011-02-13 18:40
    关注

    I wont pretend to know what all those numbers mean, but here's something to help start you off with the first line of each person.

    preg_match_all('/(?P<position>[0-9-]+)\s+(?P<last>[a-z]+)\s*,\s*(?P<first>[a-z]+)\s+((?P<age>[0-9]{2})\s)?(?P<school>[a-z -]+[a-z])\s+(?P<seed>(NT|[0-9:.]+))\s+(?P<final>[0-9:\.]+)\s+(?P<division>[a-z]+)/is', $html, $matches);
    print_r($matches);
    

    The regex is very basic and seems to work right now, but when dealing with content you don't have control over, you may want to account for a lot more. For instance, right now that name matching wont work with names that have accented characters or punctuation characters like in the name O'Reilly.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 有两个非常“自以为是”烦人的问题急期待大家解决!
  • ¥30 STM32 INMP441无法读取数据
  • ¥100 求汇川机器人IRCB300控制器和示教器同版本升级固件文件升级包
  • ¥15 用visualstudio2022创建vue项目后无法启动
  • ¥15 x趋于0时tanx-sinx极限可以拆开算吗
  • ¥500 把面具戴到人脸上,请大家贡献智慧
  • ¥15 任意一个散点图自己下载其js脚本文件并做成独立的案例页面,不要作在线的,要离线状态。
  • ¥15 各位 帮我看看如何写代码,打出来的图形要和如下图呈现的一样,急
  • ¥30 c#打开word开启修订并实时显示批注
  • ¥15 如何解决ldsc的这条报错/index error