doushizhou4477 2018-05-25 14:40
浏览 127

如何匹配正则表达式中的文本

I have a text

<div class="ti"><div class="pic">
        <a href="/categories/rr/1.html"><img src="http://www.erty.com/images/440f2d2a.jpg" alt="Ind"> <span>Ind</span></a> (98)
    </div></div><div class="ti"><div class="pic">
        <a href="/categories/ert/1.html"><img src="http://www.erty.com/images/4123d2b.jpg" alt="Wes"> <span>Wes</span></a> (6044)
    </div></div>

How Can I use preg_match_all in PHP to get

  1. /categories/rr/1.html

  2. http://www.erty.com/images/440f2d2a.jpg

  3. Ind

  4. 98

for all entries.

I tried

preg_match_all('|[^<div class="ti"><div class="pic">].*?[^<\/div><\/div>]+|',
$test_html,
$out, PREG_PATTERN_ORDER);

But its not working.

  • 写回答

3条回答 默认 最新

  • dqc19941228 2018-05-25 14:46
    关注

    Never try to parse HTML with RegExp.

    Since your html file is probably also an XML file, try this.

    $html = "<div class="ti"><div class="pic"><a href="/categories/rr/1.html"><img src="http://www.erty.com/images/440f2d2a.jpg" alt="Ind"> <span>Ind</span></a></div></div><div class="ti"><div class="pic"><a href="/categories/ert/1.html"><img src="http://www.erty.com/images/4123d2b.jpg" alt="Wes"> <span>Wes</span></a></div></div>";
    $doc = new DOMDocument();
    $doc->loadHTML($html);
    $sxml = simplexml_import_dom($doc);
    

    Or, if you're scraping a website you'd better use jQuery selectors in a node.js app.

    评论

报告相同问题?

悬赏问题

  • ¥20 完全没有学习过GAN,看了CSDN的一篇文章,里面有代码但是完全不知道如何操作
  • ¥15 使用ue5插件narrative时如何切换关卡也保存叙事任务记录
  • ¥20 软件测试决策法疑问求解答
  • ¥15 win11 23H2删除推荐的项目,支持注册表等
  • ¥15 matlab 用yalmip搭建模型,cplex求解,线性化处理的方法
  • ¥15 qt6.6.3 基于百度云的语音识别 不会改
  • ¥15 关于#目标检测#的问题:大概就是类似后台自动检测某下架商品的库存,在他监测到该商品上架并且可以购买的瞬间点击立即购买下单
  • ¥15 神经网络怎么把隐含层变量融合到损失函数中?
  • ¥15 lingo18勾选global solver求解使用的算法
  • ¥15 全部备份安卓app数据包括密码,可以复制到另一手机上运行