duanluwei9374 2010-07-17 18:34
浏览 73
已采纳

RegEx不支持预测!

Hey guys, I am trying to match "address" in this page -

http://www.bbb.org/norfolk/business-reviews/tax-return-preparation/liberty-tax-service-in-virginia-beach-va-48000604

The source of address part has this HTML

<tr>
    <td align="right" class="generalinfo_left">Address:</td>
    <td class="generalinfo_right">1 S Main St Ste 1430<br /></td>
</tr>
<tr>
    <td align="right" class="generalinfo_left"></td>
    <td class="generalinfo_right">Dayton, OH 45402</td>
</tr>

So, I tried the following RegEx in PHP.

"%Address:</td>(.*?)(?!<br />)</td>%s"

where "s" is the modifier for "." to match new lines too. But it is not working. It doesnt matches the "Dayton, OH 45402" part. Can anyone tell me why?

  • 写回答

3条回答 默认 最新

  • dpql57753 2010-07-17 18:56
    关注

    It's pretty normal: If you look at your sample text, you will see that between Address and Dayton, OH 45402, you have <br />. (?!<br />) specifically states that it should not match if <br /> is found.

    You should use parser for HTML.

    That said, assuming that all your files are exactly like this sample, this ugly regex should work:

    %(Address:)(.*?generalinfo_right">)(.*?)((<br />)|(</td>))(.*?generalinfo_right">)(.*?)((<br />)|(</td>))%s
    

    Groups 1, 3 and 8 contain the address.

    However, since most likely your documents are not all exactly like that, a much better solution will be to parse HTML with a proper parser.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)
编辑
预览

报告相同问题?

手机看
程序员都在用的中文IT技术交流社区

程序员都在用的中文IT技术交流社区

专业的中文 IT 技术社区,与千万技术人共成长

专业的中文 IT 技术社区,与千万技术人共成长

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

客服 返回
顶部