dongshiliao7990 2012-01-19 03:56
浏览 140
已采纳

HTML标签上的简单正则表达式

Problem One:

</a>              

19-10-2011, 04:49 PM

             </td> <td class="thread" 

How to fetch the DATE and TIME i.e. 19-10-2011, 04:49 PM

Note: the above snippet could have unstable spacing as you see above e.g. </td> <td class

My attempt:

preg_match("#</a>(.*?)</td> <td class=\"thread\"#", $page, $fetchContent);

Result: empty


Problem Two:

<div id="post_message_43345">ANY TYPE OF CONTENT INCLUDING SPACES</tr> <tr>

I need to fetch "ANY TYPE OF CONTENT".

Note: the spacing between tags such as </tr> <tr> could vary from page to another.

My attempt:

preg_match("#<div id=\"post_message_[a-zA-Z0-9_]*\">(.*?)</tr> <tr>#", $page, $fetchedContent);

Result: empty

I'm looking for rough temporary short snippet for one task. Therefore, i didn't use HTML parser.

Any help will be appreciated.

  • 写回答

2条回答 默认 最新

  • douxiong2999 2012-01-19 04:05
    关注

    Problem 1

    You need to use the s flag to have . match newline characters too:

    preg_match("#</a>(.*?)</td> <td class=\"thread\"#s", $page, $fetchContent);
    

    You'd probably be better off matching the date directly though:

    preg_match("#([0123]?[0-9]-(?:0?[1-9]|1[012])-(?:[0-9]{4})),? ?((?:0[0-9]|1[012]):[0-5][0-9] ?[AP]M)#",...)
    

    edit - this date regex will be a little faster (added boundaries either side):

    preg_match("#\\b([0123]?[0-9]-(?:0?[1-9]|1[012])-(?:[0-9]{4}))[, ]{1,3}((?:0[0-9]|1[012]):[0-5][0-9] ?[AP]M)\\b#",...)
    

    For both, the date is in $results[1] and the time is in $results[2].

    Problem 2

    Again the s flag, and to have varying spaces between the </tr> <tr> use *.

    preg_match("#<div id=\"post_message_[a-zA-Z0-9_]*\">(.*?)</tr> *<tr>#s", $page, $fetchedContent);
    

    If you want to allow for newlines between the </tr> and <tr> then do \s* instead. Same for Problem 1.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 机器学习能否像多层线性模型一样处理嵌套数据
  • ¥20 西门子S7-Graph,S7-300,梯形图
  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题
  • ¥15 matlab不知道怎么改,求解答!!
  • ¥15 永磁直线电机的电流环pi调不出来
  • ¥15 用stata实现聚类的代码
  • ¥15 请问paddlehub能支持移动端开发吗?在Android studio上该如何部署?
  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效