I am working on a scraping project to extract web data from a website. I have made a script to go through URLs and parse HTML contents and get the structured content into my database.The script was working fine,but recently the script got stuck and on investigation it was found that the target site is blocking our IP.
I am using PHP / CURL for this project,now I am getting a 403 error - Access Forbidden, error on a web request. This has affected the working of my script,no pages could be retrieved from web request,every time I am getting an access restricting error.
I know there are lot of scraping etiquette's to be followed.Since we can't foresee how they had implemented the security features,I was confused on normalizing the web request calls. I'm working on an amazon AWZ instance with an elastic IP,hence I am confused on when/whether they would lift the ban on my IP.
I have heard of rotating proxy methods to be used with scraping,such that the target server won't block you often.But I'm not sure about it's implementation.
Any help would be highly appreciated.I could provide any additional information if necessary.