doureng5668 2015-10-01 06:06
浏览 11

相对URl抓取问题

I'm making a crawler that fetches all relative and absolute links. But if there is a relative url that is incorrect, then the crawler continues to prepare new absolute url in the website that handles incorrect urls with 200 response code.

Let's say, there is a relative link : "example/example.php", when I try to crawl http://example.com/example.com. When I find that page, I'll append and create a new link to crawl i.e. http://example.com/example/example.php. The problem is the page will again contain example/example.php which then appends to http://example.com/example/example/example.php.

Is there a better way of getting rid of this other than content comparison?

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 uniapp uview http 如何实现统一的请求异常信息提示?
    • ¥15 有了解d3和topogram.js库的吗?有偿请教
    • ¥100 任意维数的K均值聚类
    • ¥15 stamps做sbas-insar,时序沉降图怎么画
    • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
    • ¥15 关于#Java#的问题,如何解决?
    • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
    • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
    • ¥15 cmd cl 0x000007b
    • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line