I'm grabbing links from a website, but I'm having a problem in which the higher I set the recursion depth for the function the results become stranger
for example when I set the function to the following
crawl_page("http://www.mangastream.com/", 10);
I will get a results like this for about half the page
http://mangastream.com/read/naruto/51619850/1/read/naruto/51619850/2/read/naruto/51619850/2/read/naruto/51619850/2/read/naruto/51619850/2/read/naruto/51619850/2/read/naruto/51619850/2/read/naruto/51619850/2
EDIT
while I'm expecting results like this instead
http://mangastream.com/manga/read/naruto/51619850/1
here's the function I've been using to get the results
function crawl_page($url, $depth)
{
static $seen = array();
if (isset($seen[$url]) || $depth === 0) {
return;
}
$seen[$url] = true;
$dom = new DOMDocument('1.0');
@$dom->loadHTMLFile($url);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element) {
$href = $element->getAttribute('href');
if (0 !== strpos($href, 'http')) {
$href = rtrim($url, '/') . '/' . ltrim($href, '/');
}
if(shouldScrape($href)==true)
crawl_page($href, $depth - 1);
}
echo $url,"";
//,pageStatus($url)
}
any help with this would be greatly appreciated