dtqf81594 2016-09-15 10:22
浏览 45

我想使用php爬虫从本文档中获取特定的URL

I have no idea of what to do about this and I'm probably gonna get some down votes.

I have an web page similar to this:

<li class="specific-class">
    <a href="http://unknown-url.com">Unknown Link</a>
</li>

I want to crawl a page filled with several other elements I'm not interested in retrieving.

I want to retrieve only the href attribute in the anchor tag, within the li element and nothing else. After which I will then follow the link and get another webpage that has something like this:

<h1 class="specific-class">Blah Blah Blah</h1>

So at the end of it all, I'll get whatever is in the h1 element:

Blah Blah Blah

If you guys could help me get around this I'd greatly appreciate. Also, any API's will do nicely.

I have this piece of code that gets attributes from an element but I've not been able to get it to crawl elements found within a specific element.

<?php
include_once('simple_html_dom.php');
$target_url = "https://www.google.com/";
$html = new simple_html_dom();
$html->load_file($target_url);
foreach($html->find('a') as $link){
     echo $link->href."<br>";

}

?>
  • 写回答

1条回答 默认 最新

  • dongmeng2687 2016-09-15 10:32
    关注

    Please read about DOMDocument. You can use the methods: getElementsByTagName, getElementById etc.

    评论

报告相同问题?

悬赏问题

  • ¥50 永磁型步进电机PID算法
  • ¥15 sqlite 附加(attach database)加密数据库时,返回26是什么原因呢?
  • ¥88 找成都本地经验丰富懂小程序开发的技术大咖
  • ¥15 如何处理复杂数据表格的除法运算
  • ¥15 如何用stc8h1k08的片子做485数据透传的功能?(关键词-串口)
  • ¥15 有兄弟姐妹会用word插图功能制作类似citespace的图片吗?
  • ¥200 uniapp长期运行卡死问题解决
  • ¥15 latex怎么处理论文引理引用参考文献
  • ¥15 请教:如何用postman调用本地虚拟机区块链接上的合约?
  • ¥15 为什么使用javacv转封装rtsp为rtmp时出现如下问题:[h264 @ 000000004faf7500]no frame?