dongza6247 2015-09-30 18:17
浏览 10
已采纳

loadHTML返回空,html很好

I'm trying to grab the href value of an element using PHP, but I'm having some trouble. Here's a snippet of my code.

  <?php
  ini_set("log_errors", 1);
  ini_set("error_log", "php-error.log");
  $target_url = "http://foo.bar";
  $request = $target_url;
  $html = $this->scraper($request);
  $dom = new DOMDocument();
  $dom->loadHTML($html);
  // Error point - $dom is empty
  error_log("dom:");
  error_log($dom);
  $xpath = new DOMXPath($dom);
  error_log("setting target url");
  $target_url = $xpath->query("//*[@class='foo_bar']/href");
  ?>

Logging $html results in the standard, full HTML output of the page. A search shows that my xpath should work. However, when I try to log $dom after loadHTML, I get a blank result. I've been struggling for a few hours trying to work out why, but with no luck.

Does anyone have any ideas/anything I could try?

Edited to add console output:

    [30-Sep-2015 13:51:59 America/New_York] dom:
    [30-Sep-2015 13:51:59 America/New_York] setting target url
  • 写回答

1条回答 默认 最新

  • dsapkqaduj6718493 2015-10-05 18:42
    关注

    You should check that the HTML was loaded into the DOM. You can use a debugger, the logging or var_dump() for that.

    var_dump($dom->saveXml());

    If its wasn't loaded into DOM take a step back and validate that the HTML was fetched by the scraper.

    var_dump($html);

    If the HTML was loaded into the DOM you will still need to fix the Xpath. I would expect href being an attribute node.

    //*[@class='foo_bar']/@href

    You seem to want to read it as a string value, so cast it:

    string(//*[@class='foo_bar']/@href)

    That only works with DOMXpath::evaluate(), DOMXpath::query() can only return node lists.

    $target_url = $xpath->evaluate("string(//*[@class='foo_bar']/@href)");
    

    A small example:

    $document = new DOMDocument();
    $document->loadHtml('<a href="http://example.com">Example</a>');
    $xpath = new DOMXpath($document);
    var_dump($xpath->evaluate('string(//a[1]/@href)'));
    

    Output:

    string(18) "http://example.com"
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥500 火焰左右视图、视差(基于双目相机)
  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本