doujiaci7976 2012-04-05 09:34
浏览 18
已采纳

如何在网站目录中查找文件?

I'm creating a web crawler. I'm ganna give it an URL and it will scan through the directory and sub directories for .html files. I've been looking at two alternatives:

  1. scandir($url). This works on local files but not on http sites. Is this because of file permissions? I'm guessing it shouldn't work since it would be dangerous for everyone to have access to your website files.

  2. Searching for links and following them. I can do file_get_contents on the index file, find links and then follow them to their .html files.

Do any of these 2 work or is there a third alternative?

  • 写回答

2条回答 默认 最新

  • dtvhqlc57127 2012-04-05 09:39
    关注

    The only way to look for html files is to parse throuhg the file content returned by the server, unless by small chance they have enabled directory browsing on the server, which is one of the first things disabled usually, you dont have access to browse directory listings, only the content they are prepared to show you, and let you use.

    You would have to start a http://www.mysite.com and work onwards scanning for links to html files, what if they have asp/php or other files which then return html content?

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥17 pro*C预编译“闪回查询”报错SCN不能识别
  • ¥15 微信会员卡接入微信支付商户号收款
  • ¥15 如何获取烟草零售终端数据
  • ¥15 数学建模招标中位数问题
  • ¥15 phython路径名过长报错 不知道什么问题
  • ¥15 深度学习中模型转换该怎么实现
  • ¥15 HLs设计手写数字识别程序编译通不过
  • ¥15 Stata外部命令安装问题求帮助!
  • ¥15 从键盘随机输入A-H中的一串字符串,用七段数码管方法进行绘制。提交代码及运行截图。
  • ¥15 TYPCE母转母,插入认方向