我想使用php爬虫从本文档中获取特定的URL

I have no idea of what to do about this and I'm probably gonna get some down votes.

I have an web page similar to this:

<li class="specific-class">
    <a href="http://unknown-url.com">Unknown Link</a>
</li>

I want to crawl a page filled with several other elements I'm not interested in retrieving.

I want to retrieve only the href attribute in the anchor tag, within the li element and nothing else. After which I will then follow the link and get another webpage that has something like this:

<h1 class="specific-class">Blah Blah Blah</h1>

So at the end of it all, I'll get whatever is in the h1 element:

Blah Blah Blah

If you guys could help me get around this I'd greatly appreciate. Also, any API's will do nicely.

I have this piece of code that gets attributes from an element but I've not been able to get it to crawl elements found within a specific element.

<?php
include_once('simple_html_dom.php');
$target_url = "https://www.google.com/";
$html = new simple_html_dom();
$html->load_file($target_url);
foreach($html->find('a') as $link){
     echo $link->href."<br>";

}

?>

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongmeng2687 2016-09-15 10:32
关注
Please read about DOMDocument. You can use the methods: getElementsByTagName, getElementById etc.

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

爬虫如何在url中加入变量？ python 爬虫
2021-08-14 09:34

回答 4 已采纳 f 在字符串外面 a = input() url = f'https://movie.douban.com/top250?start={a}&filter=' 或者 a = 'abc' b = '
如何使用python爬虫从企查查上获得专利文献内容？ python windows 有问必答爬虫
2021-12-18 11:16

回答 2 已采纳题主要的代码如下， from bs4 import BeautifulSoup import requests header = {"user-agent":"Mozilla/5.0.html (
python爬虫,我该怎么获取想要的内容（映射练习） css python 有问必答爬虫
2021-12-25 21:31

回答 1 已采纳因为题主将css中的空格全部替换掉了，但是正则中还有空格，并且正则分组用()，不是{}，而且svg中没有span标签，是text改下面就可以了 import re import requests fr
Python网络爬虫使用教程
2023-06-13 16:50

TTTALK的博客 python爬虫资源抓取--urllib/requests/requests-html、正则表达式、数据解析-Beautiful Soup/lxml/selectolax、自动化爬虫--selenium、爬虫框架--Scrapy/pyspider、模拟登录与验证码识别、autoscraper
python爬虫的时候想提取URL但是提取不全 python 爬虫
2022-09-19 00:24

回答 3 已采纳是被你的过滤条件过滤了吧，检查一下过滤条件： import bs4 as bs import urllib.request source = urllib.request.urlopen("https
python爬虫爬取到的内容无法输出到txt文档中 python
2022-08-12 12:20

回答 3 已采纳不如换用requests库和bs4库吧。 from bs4 import BeautifulSoup as bs import requests as r url = 'https://fanqie
爬虫相关，php如何实现用cookie实现扫码登录？ php 前端有问必答爬虫
2021-09-28 22:32

回答 3 已采纳先要确认二维码存储的是什么内容？如果是网址并且是要爬取网站的网址而不是居于第三方登录的，可以先下载二维码，同时注意保存返回的cookie信息，然后用对应的php二维码类库解析出二维码地址，curl请求
Python 网络爬虫与数据采集（一）
2022-01-30 21:28

秃顶的博客 Python 网络爬虫与数据采集第1章序章网络爬虫基础1 爬虫基本概述1.1 爬虫是什么1.2 爬虫可以做什么1.3 爬虫的分类1.4 爬虫的基本流程1.4.1 浏览网页的流程1.4.2 爬虫的基本流程1.5 爬虫与反爬虫1.5.1 爬虫的攻与防...
python爬虫中http.client.HTTPSConnection与request的使用 python 有问必答爬虫
2021-12-26 11:25

回答 2 已采纳用它的API啊，这样就省得怕被反爬了 https://docs.opensea.io/reference/api-overview
python爬虫如何自动获取Network中的某个XHR地址？ python 爬虫网络
2018-08-25 08:43

回答 4 已采纳楼主问的可能有点不清楚，我的理解是：https://zh.flightaware.com/live/airport/+{机场代号} 楼主有几千个机场代号，需要爬取这几千个URL的https://zh
爬虫，关于JavaScript写出来的目标url如何提取 python 数据库爬虫
2022-12-08 12:48

回答 2 已采纳说明这个url是动态拼接出来，使用webdriver打开浏览器等待加载完成，再去拿页面源码，再定位这个url. 如有帮助，请采纳哦
Python爬虫开发学习全教程第二版，爆肝十万字【建议收藏】
2021-10-17 13:35

五包辣条！的博客大家好，我是辣条。上次整理的爬虫教程反响不错，但是还是有小伙伴表示不够细致，今天带了升级版，全文很长，建议先收藏下来。一、爬虫基础爬虫概述知识点：了解爬虫的概念了解爬虫的作用 ...
php爬虫，如何获取空key的json的值 json php
2023-03-30 14:00

回答 2 已采纳该回答通过自己思路及引用到GPTᴼᴾᴱᴺᴬᴵ搜索,得到内容具体如下：首先需要将给定的JSON字符串中的单引号替换为双引号，因为JSON规范要求使用双引号表示键和值。接下来，你可以使用json_d
Python3网络爬虫开发实战
2023-02-26 14:30

胆怯与勇敢的博客 Python3网络爬虫开发实战
web前端加php题,也许你需要点实用的-Web前端笔试题
2021-04-24 02:41

方轩固的博客之前发的一篇博客里没有附上答案，现在有空整理了下发出来，希望能帮助到...使用外联的css和js，结构行为表现的分离：文件下载与页面加载速度更快，内容能被更广泛的设备所访问；更少的代码和组件：容易维护，改版...
没有解决我的问题, 去提问

悬赏问题

¥50 永磁型步进电机PID算法
¥15 sqlite 附加（attach database）加密数据库时，返回26是什么原因呢？
¥88 找成都本地经验丰富懂小程序开发的技术大咖
¥15 如何处理复杂数据表格的除法运算
¥15 如何用stc8h1k08的片子做485数据透传的功能？(关键词-串口)
¥15 有兄弟姐妹会用word插图功能制作类似citespace的图片吗？
¥200 uniapp长期运行卡死问题解决
¥15 latex怎么处理论文引理引用参考文献
¥15 请教：如何用postman调用本地虚拟机区块链接上的合约？
¥15 为什么使用javacv转封装rtsp为rtmp时出现如下问题：[h264 @ 000000004faf7500]no frame？

我想使用php爬虫从本文档中获取特定的URL

1条回答 默认 最新

悬赏问题

1条回答默认最新