dongtuo1482 2010-09-09 11:52
浏览 109
已采纳

获取搜索关键字的div(file_get_contents('url')

So im creating a webcrawler and everything works, only got 1 problem.

With file_get_contents($page_data["url"]); I get the content of a webpage. This webpage is scanned when one of my keywords excists on the webpage.

$find = $keywords; $str = file_get_contents($page_data["url"]);

if(strpos($str, $find) == true)

When i want to insert the data into mysql-database i only want the info inside the div the keyword is find in.

I know i have to use DOM but i'm new into the domdocument scene.

EXAMPLE: http://crawler.tmp.remote.nl/example.php

  • 写回答

4条回答 默认 最新

  • dpoppu4300 2010-09-10 12:58
    关注

    I solved the problem with:

        $doc = new DOMDocument();
        $doc->loadHTML($str);
    
        $xPath = new DOMXpath($doc);
        $xPathQuery = "//text()[contains(translate(.,'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), '".strtoupper($keywords)."')]";
        $elements = $xPath->query($xPathQuery);
    
        if($elements->length > 0){
    
        foreach($elements as $element){
            print "Gevonden: " .$element->nodeValue."<br />";
        }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
  • ¥15 解riccati方程组
  • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
  • ¥30 用arduino开发esp32控制ps2手柄一直报错
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名
  • ¥65 汇编语言除法溢出问题
  • ¥15 Visual Studio问题
  • ¥20 求一个html代码,有偿