doujin8476 2016-11-17 19:45
浏览 53

使用php从html页面中提取href

I trying to extract the news headlines and the link (href) of each headline using the code bellow, but the link extraction is not working. It's only getting the headline. Please help me find out what's wrong with the code.

Link to page from which I want to get the headline and link from: http://web.tmxmoney.com/news.php?qm_symbol=BCM

<?php
$data = file_get_contents('http://web.tmxmoney.com/news.php?qm_symbol=BCM');
$dom = new domDocument;
@$dom->loadHTML($data);
$dom->preserveWhiteSpace = true;
$xpath = new DOMXPath($dom);
$rows = $xpath->query('//div');

foreach ($rows as $row) {

    $cols = $row->getElementsByTagName('span');

    $newstitle = $cols->item(0)->nodeValue;

    $link = $cols->item(0)->nodeType === HTML_ELEMENT_NODE ? $cols->item(0)->getElementsByTagName('a')->item(0)->getAttribute('href') : '';

echo $newstitle . '<br>';
echo $link . '<br><br>';
}
?>

Thanks in advance for your help!

  • 写回答

2条回答 默认 最新

  • ds19891231 2016-11-17 20:52
    关注

    Try to do this:

    <?php
      $data= file_get_contents('http://web.tmxmoney.com/news.php?qm_symbol=BCM');
    
      $dom = new DOMDocument();
      @$dom->loadHTML($data);
      $xpath = new DOMXPath($dom);
      $hrefs= $xpath->query('/html/body//a');
    
      for($i = 0; $i < $hrefs->length; $i++){
       $href = $hrefs->item($i);
       $url = $href->getAttribute('href');
       $url = filter_var($url, FILTER_SANITIZE_URL);
    
       if(!filter_var($url, FILTER_VALIDATE_URL) === false){
          echo '<a href="'.$url.'">'.$url.'</a><br />';
       }
      }
    ?>
    
    评论

报告相同问题?

悬赏问题

  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度