dongwei4444 2014-08-14 15:52
浏览 151

重定向太多,但通过什么途径?

I've got a simple web scraper/spider based on goquery, which in turn uses net/http. It works great, until I hit a website with too many redirects.

Get http://www.example.com/some/path.html: stopped after 10 redirects

But why? Did it redirect to itself? Did it throw me into some spider jail? I want to know to what url's I got redirected, and in what order.

The function giving the error seems to know this, since it's checking the length of a slice of requests, but I don't really want to edit the net/http package myself.

Here's that function from http://golang.org/src/pkg/net/http/client.go

func defaultCheckRedirect(req *Request, via []*Request) error {
    if len(via) >= 10 {
        return errors.New("stopped after 10 redirects")
    }
    return nil
}
  • 写回答

1条回答 默认 最新

  • dongwen9051 2014-08-14 15:57
    关注

    You can pass your own function to http.Client, for example:

    client := &http.Client{
        CheckRedirect: func(req *Request, via []*Request) error {
            log.Println("redirect", req.URL)
            if len(via) >= 10 {
                return errors.New("stopped after 10 redirects")
            }
            return nil
        },
    }
    
    评论

报告相同问题?