如何获取域上的所有网页

I am making a simple web spider and I was wondering if there is a way that can be triggered in my PHP code that I can get all the webpages on a domain...

e.g Lets say I wanted to get all the webpages on Stackoverflow.com . That means that it would get: https://stackoverflow.com/questions/ask pulling webpages from an adult site -- how to get past the site agreement? https://stackoverflow.com/questions/1234214/ Best Rails HTML Parser

And all the links. How can I get that. Or is there an API or DIRECTORY that can enable me to get that?

Also is there a way I can get all the subdomains?

Btw how do crawlers crawl websites that don't have SiteMaps or Syndication feeds?

Cheers.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

5条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongse3348 2012-12-17 21:21
关注
If a site wants you to be able to do this, they will probably provide a Sitemap. Using a combination of a sitemap and following the links on pages, you should be able to traverse all the pages on a site - but this is really up to the owner of the site, and how accessible they make it.

If the site does not want you to do this, there is nothing you can do to work around it. HTTP does not provide any standard mechanism for listing the contents of a directory.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(4条)

报告相同问题？

关注问题

PHP获取网页源带UserAgent请求 php 有问必答
2022-03-07 21:49

回答 2 已采纳题主的user-agent没看出是移动端的，换下面的其中一个试试Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit
使用PHP [关闭]获取网页上使用的所有CSS样式 css html php
2014-09-12 22:07

回答 1 已采纳 You can use the inspector of Google Chrome. Just Right-Click > Inspect Element. go to the so
php代码显示在网页上 html php
2016-05-18 05:22

回答 2 已采纳 this code <form method = "post" action = ">?php echo htmlspecialchars($_SERVER["PHP_SELF"])
【PHP基础-5】PHP变量作用域、函数及其实例
2022-04-03 11:09

像风一样9的博客目录1 变量的作用域1.1 local（本地的）—— 局部变量的作用域1.2 global（总体的）—— 全局变量的作用域1.3 static（静态的）—— 静态变量作用域1.4 4、parameter（参数）——参数作用域2 函数2.1 函数语法2.2 ...
php如何获取文件夹里所有avi文件 php
2014-12-09 14:32

回答 5 已采纳 ".$file = basename($item).""; }
php怎么获取网页中播放器里面的动态的token 播放地址？ php 有问必答
2022-03-17 23:26

回答 3 已采纳 token不是在列表里面，直接请求每个频道对应的页面再用正则提取下，示例如下 <meta charset="utf-8"> <?php $url="http://iptv.ever
php获取文件夹内所有文件 php
2014-12-12 08:31

回答 4 已采纳根据多位大神，php获取服务器文件夹所有文件源码 ``` php '; echo "".$file = basename($item).""; echo
vb脚本写获取登陆域用户信息
2019-04-11 10:31

漫步技术小栈的博客 '****定义获取网络域账户**** set wshNetwork = WScript.CreateObject("WScript.Network") wscript.echo "登陆域"&wshNetwork.UserDomain wscript.echo "计算机名"&wshNetwork.ComputerName wscript.echo ...
php使用curl爬取页面,json数据获取不完整 json php 有问必答
2021-08-02 16:03

回答 2 已采纳你访问的是同一个url?你爬取的是列表内容。并没有去请求详细内容
Layui表格怎么获取数据库的数据，我用的php layui php
2022-04-16 12:44

回答 3 已采纳 https://blog.csdn.net/weixin_39218464/article/details/109012541
php 数据库获取数据的问题 php 数据库
2018-10-16 07:37

回答 2 已采纳 table A id merchant max 1 A 100 2 B 200 3 C 300 tabl
网页授权获取用户信息
2024-05-15 18:50

楚辞大魔王的博客微信公众号获取用户信息的方法--网页授权获取用户信息
php制作网页登录注册,登录注册的页面制作
2021-04-23 12:03

Vigorous Cooler的博客然后再就是文本框填内容，在注册这里建三个文本框，用户名，密码，还有姓名，最后还有一个注册的按钮，用来跳转页面分析好了之后，先建注册页面的第一个网页htmlPUBLIC"-//W3C//DTDXHTML1.0Transitional//EN"...
php微信开放平台获取openid,微信公众平台获取openid
2021-04-18 03:20

扣酱的星星眼的博客微信公众平台可以通过接口获取用户的openid，但是获取用户信息的时候需要做一次跳转。我在后台有一张用户表，用户如果关注了公众平台就将他的信息录入到数据库，包括openid。现在要做一个简单的系统，用户参加某些...
php获得上一个页面的数据,PHP把数据传到下一个页面的4种方法
2021-04-12 22:57

铝单车上的镁男子的博客 /*数据存到数据库成功之后想跳转到下一个页面，PHP然后把数据也传到下一个页面php里面当页面读取结束的时候，所有的变量将全部清空，所以需要通过别的手段来传值，*///1、通过url参数比如：在跳转链接上加上?...
没有解决我的问题, 去提问

悬赏问题

¥20 想写一个文件管理器，加载全部子文件夹后，要一级一级返回
¥15 华为超融合部署环境下RedHat虚拟机分区扩容问题
¥15 哪位能做百度地图导航触点播报？
¥15 请问GPT语言模型怎么训练？
¥15 已知平面坐标系（非直角坐标系）内三个点的坐标，反求两坐标轴的夹角
¥15 webots有问题，无响应
¥15 使用VH6501干扰RTR位，CANoe上显示的错误帧不足32个就进入bus off快慢恢复，为什么？
¥15 大智慧怎么编写一个选股程序
¥100 python 调用 cgps 命令获取实时位置信息
¥15 两台交换机分别是trunk接口和access接口为何无法通信，通信过程是如何？

如何获取域上的所有网页

5条回答 默认 最新

悬赏问题

5条回答默认最新