douganggu4392 2014-08-04 09:38
浏览 36

如何使用代理处理外部网站的IP块?

I am working on a scraping project to extract web data from a website. I have made a script to go through URLs and parse HTML contents and get the structured content into my database.The script was working fine,but recently the script got stuck and on investigation it was found that the target site is blocking our IP.

I am using PHP / CURL for this project,now I am getting a 403 error - Access Forbidden, error on a web request. This has affected the working of my script,no pages could be retrieved from web request,every time I am getting an access restricting error.

I know there are lot of scraping etiquette's to be followed.Since we can't foresee how they had implemented the security features,I was confused on normalizing the web request calls. I'm working on an amazon AWZ instance with an elastic IP,hence I am confused on when/whether they would lift the ban on my IP.

I have heard of rotating proxy methods to be used with scraping,such that the target server won't block you often.But I'm not sure about it's implementation.

Any help would be highly appreciated.I could provide any additional information if necessary.

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 一直显示正在等待HID—ISP
    • ¥15 Python turtle 画图
    • ¥15 关于大棚监测的pcb板设计
    • ¥15 stm32开发clion时遇到的编译问题
    • ¥15 lna设计 源简并电感型共源放大器
    • ¥15 如何用Labview在myRIO上做LCD显示?(语言-开发语言)
    • ¥15 Vue3地图和异步函数使用
    • ¥15 C++ yoloV5改写遇到的问题
    • ¥20 win11修改中文用户名路径
    • ¥15 win2012磁盘空间不足,c盘正常,d盘无法写入