doushi5913 2017-04-07 17:54
浏览 73
已采纳

这个html dom php代码有什么问题?

I'm trying to do a code that will print the contents of all the elements with itemprop="price" from some link but it don't work, I can't figure out why, this is the code:

<?php
error_reporting(0);
ini_set('display_errors', 0);
$doc      = new DOMDocument();
$allscan  = array(
    'http://www.mobile54.co.il/30786',
    'http://www.mobile54.co.il/35873',
    'http://www.mobile54.co.il/34722'
);
$alllinks = array();
$html     = file_get_contents($allscan[0]);
$doc->loadHTML($html);
$href = $doc->getElementsByTagName('a');
for ($j = 0; $j < count($allscan); $j++) {
    $html = file_get_contents($allscan[$j]);
    $doc->loadHTML($html);
    $href = $doc->getElementsByTagName('a');
    for ($i = 0; $i < $href->length; $i++) {
        $link = $href->item($i)->getAttribute("href");
        $lin  = preg_replace('/\s+/', '', 'http://www.mobile54.co.il' . $link . "<br />");
        if (strpos($link, 'items/') && !strpos($link, '#techDetailsAName')) {
            if (!in_array($lin, $alllinks)) {
                $alllinks[] = $lin;
            }
        }
    }
}

for ($i = 0; $i < count($alllinks); $i++) {
    echo $alllinks[$i];
}
for ($i = 0; $i < count($alllinks); $i++) {
    $lin  = "$alllinks[$i]";
    $html = file_get_contents($lin);
    $doc->loadHTML('<?xml encoding="UTF-8"?>' . $html);
    $span = $doc->getElementsByTagName('span');
    for ($j = 0; $j < $span->length; $j++) {
        $attr = $span->item($j)->getAttribute('itemprop');
        if ($attr == "price") {
            echo $span->item($j)->textContent . "<br />";
        }
    }
}


?> 

when I paste "someurl" insted of $lin it work but the other way doesn't. I've tried to do $html = file_get_contents($alllinks[$i]); but it didn't work, I don't know why.

  • 写回答

1条回答 默认 最新

  • duanran3115 2017-04-07 18:48
    关注

    I think your problem is probably that you appended a <br /> to the end of your URL for some reason. But, there are a lot of opportunities to improve your code with use of XPath. (Note also that you can just pass a URL directly to the DomDocument object.)

    First we pull all the <a> elements with matching attribute values. We get the URLs and then search them for elements with the exactly matching itemprop attribute, and get the text content of them.

    <?php
    $url = "http://www.mobile54.co.il/30786";
    $prices = [];
    $hrefs = [];
    $combined = [];
    
    $dom = new DomDocument;
    libxml_use_internal_errors(true);
    $dom->loadHtmlFile($url);
    $xpath = new DomXPath($dom);
    // get <a> elements with href containing items/ but not #techDetailsAName
    $nodes = $xpath->query("//a[contains(@href, 'items/') and not(contains(@href, '#techDetailsAName'))]/@href");
    foreach ($nodes as $node) {
        $hrefs[] = trim($node->value);
    }
    
    // now you have a list of URLs
    foreach ($hrefs as $k=>&$href) {
        $href = "http://www.mobile54.co.il$href";
        $dom->loadHtmlFile($href);
        $xpath = new DomXPath($dom);
        // get any element with itemprop of price
        $nodes = $xpath->query("//*[@itemprop='price']");
        $prices[$k] = $nodes->item(0)->textContent;
    }
    
    // now you have $urls and $prices, combine them:
    foreach ($hrefs as $k=>$v) {
        $combined[$k] = [$hrefs[$k], $prices[$k]];
    }
    print_r($combined);
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度