doujiaci7976 2012-04-05 09:34
浏览 19
已采纳

如何在网站目录中查找文件?

I'm creating a web crawler. I'm ganna give it an URL and it will scan through the directory and sub directories for .html files. I've been looking at two alternatives:

  1. scandir($url). This works on local files but not on http sites. Is this because of file permissions? I'm guessing it shouldn't work since it would be dangerous for everyone to have access to your website files.

  2. Searching for links and following them. I can do file_get_contents on the index file, find links and then follow them to their .html files.

Do any of these 2 work or is there a third alternative?

  • 写回答

2条回答 默认 最新

  • dtvhqlc57127 2012-04-05 09:39
    关注

    The only way to look for html files is to parse throuhg the file content returned by the server, unless by small chance they have enabled directory browsing on the server, which is one of the first things disabled usually, you dont have access to browse directory listings, only the content they are prepared to show you, and let you use.

    You would have to start a http://www.mysite.com and work onwards scanning for links to html files, what if they have asp/php or other files which then return html content?

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?