I'm making a crawler that fetches all relative and absolute links. But if there is a relative url that is incorrect, then the crawler continues to prepare new absolute url in the website that handles incorrect urls with 200 response code.
Let's say, there is a relative link : "example/example.php", when I try to crawl http://example.com/example.com. When I find that page, I'll append and create a new link to crawl i.e. http://example.com/example/example.php. The problem is the page will again contain example/example.php which then appends to http://example.com/example/example/example.php.
Is there a better way of getting rid of this other than content comparison?