I have a spider class which on a user request spiders websites for content. Each search results in loading about 30 websites, spidering them for the information and then standardizing this information.
I have written this in PHP using CURL, since PHP is lacking multitasking I would like to switch to Java (I am aware of the multi process curl which does not suit my demand). I need a http client which can POST/GET, receive and set cookies as well as modify HTTP headers.
I have found HtmlUnit which seems nifty but also exceeds my demand, and since the package is relatively big and I will have many hundread requests a minute I don't want to have an overkill solution slowing down my servers.
Do you think this would be an issue and do you have other suggestions to replace CURL in Java? Should I use the Java CURL binding? This is a question of efficiency and server load.