dongshi1215 2014-05-26 13:34
浏览 20
已采纳

如何在类和元素内部抓取文本

I'm trying to webscrape text from this site I want to scrape aaa-a.nl, abcinkt.nl, accudeals.nl etc..
Those urls are from the <ul class="members members-list clearfix"> class and are inside <li></li>.
How do I webscrape those in PHP?

  • 写回答

1条回答 默认 最新

  • dreinuqm992401 2014-05-26 14:05
    关注

    Let's say you have already read (CURL) the file into a variable $html. You can then follow the following procedure to extract the required element:

    $doc = new DOMDocument();
    $doc->loadHTML($html);
    $sxml = simplexml_import_dom($doc);
    if (!$sxml) {
        echo "ERROR. Do something to handle this.
    ";
    }
    $node = $sxml->xpath("//ul[contains(concat(' ', normalize-space(@class), ' '), 'members-list')]");
    foreach($nodes[0]->li as $member) {
        echo (string)$member->a; // This will echo the strings you need
    }
    

    *Not tested.

    (To understand the xpath query in the above code, see this: Getting DOM elements by classname )

    Here I'm using DOMDocument and SimpleXml. You can do this by several other ways, say, by using DOMDocument class alone to navigate the DOM, or using DOMDocument with DOMXPath, or maybe even by just using Php string functions and regex.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥40 复杂的限制性的商函数处理
  • ¥15 程序不包含适用于入口点的静态Main方法
  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码