I'm creating a web crawler. I'm ganna give it an URL and it will scan through the directory and sub directories for .html files. I've been looking at two alternatives:
scandir($url)
. This works on local files but not on http sites. Is this because of file permissions? I'm guessing it shouldn't work since it would be dangerous for everyone to have access to your website files.Searching for links and following them. I can do file_get_contents on the index file, find links and then follow them to their .html files.
Do any of these 2 work or is there a third alternative?