I want to create a proxy who can use socks5(using tor) and classic proxies then my scraper(casperjs) can make request through it.
I made a proxy using http.NewRequest here is the code in short:
main(){
...
// struct to initialise the proxy params
P = NewProxy(..)
http.ListenAndServe(":9999", P)
...
}
;
func (p Proxy) ServeHTTP(w http.ResponseWriter, r *http.Request) {
...
req, err := http.NewRequest("GET", r.URL.String(), nil)
...
}
I need to get the full URL to make the call, for the http it works but for the https i got www.example.com:443
Firstly i stared with tcp proxy using:
listener, err := net.Listen("tcp", p.from)
connection, err := listener.Accept()
remote, err := net.Dial("tcp", p.to)
and io.Copy(to, from)
to copy the data from the client to the remote server and vice-versa my code was heavily inspired by this gist by vmihailenco
it works fine for both http and https because the listening was in the TCP couche BUT the problem with this method i can't use socks5 and http(s) i have to set the type of the proxy in the call
$casperjs --proxy=127.0.0.1:9999 --proxy-type=socks5 bbuy.js
but what i need is that my proxy can concurrently send several requests, each using a different proxy. and to do that i have to modifay the transport for each request
proxyUrl, err := url.Parse("http://proxyIp:proxyPort")
http.DefaultTransport = &http.Transport{Proxy: http.ProxyURL(proxyUrl)}
thats why i choose to use the http.NewRequest and to do it i have to get the full url then i can make the call
the problem now as i said before when my scraper try to scrape for example https://example.com?a=1&b=2 the resulte that i got using r.URL.String()
is example.com:443 but if it's a http
web site i got the full url
any suggestion or idea to solve this problem