douqin1932 2016-06-18 17:19
浏览 62
已采纳

preg_match_all读取sitesource多行和匹配

I read my own website with file_get_contents to display specific text. I display the data from interviews and I want to get the interview headline and the time to use on another site (link to the interview).

The relevant code block is in a table.

<td>
    Interview 1
    <small style="color:gray">
        Persons 2
        Cameras 2
    </small>
</td>
<td>
    1018 min
</td>

As you can see, Interview 1 is the headline and the time is 1018. I tried this on my own but somehow the pattern got a little crazy.

preg_match_all('#<td>\s*(.+?)\s*<small style="color:gray">\s*<\/small>\s*<\/td><td>\s*(.+?)\s*<\/td>#is', $mysite, $match)

I used \s* for the line breaks and spaces and (.+?) to match. What's wrong with my search pattern?

  • 写回答

3条回答 默认 最新

  • dongshou9343 2016-06-18 17:38
    关注

    First you should use a parser for this, regexs on HTML function expectedly. There are two issues with your regex though.

    Issue one:

    <small style="color:gray">\s*<\/small>
    

    There isn't just white space between that element.

    Issue two:

    <\/td><td>
    

    There is a new line between the <td>s.

    So:

    <td>\s*(.+?)\s*<small style="color:gray">.+?<\/small>\s*<\/td>\s<td>\s*(.+?)\s*<\/td>
    

    should work for you (for this static example). If the small element's content is optional change the + to an *. Note also with a parser these wouldnt have been issues.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 高价求中通快递查询接口
  • ¥15 解决一个加好友限制问题 或者有好的方案
  • ¥15 关于#java#的问题,请各位专家解答!
  • ¥15 急matlab编程仿真二阶震荡系统
  • ¥20 TEC-9的数据通路实验
  • ¥15 ue5 .3之前好好的现在只要是激活关卡就会崩溃
  • ¥50 MATLAB实现圆柱体容器内球形颗粒堆积
  • ¥15 python如何将动态的多个子列表,拼接后进行集合的交集
  • ¥20 vitis-ai量化基于pytorch框架下的yolov5模型
  • ¥15 如何实现H5在QQ平台上的二次分享卡片效果?