I'm writing a web crawler in Go language to collect images on the Internet. My crawler works most of the time, but it sometimes fails to fetch images somehow.
Here's my snippet:
package main
import (
"fmt"
"net/http"
"time"
)
func main() {
var client http.Client
var resp *http.Response
// var imageUrl = "https://i.stack.imgur.com/tKsDb.png" // It works well
var imageUrl = "https://precious.jp/mwimgs/b/1/-/img_b1ec6cf54ff3a4260fb77d3d3de918a5275780.jpg" // It fails
req, _ := http.NewRequest("GET", imageUrl, nil)
req.Header.Add("User-Agent", "My Test")
client.Timeout = 3 * time.Second
resp, err := client.Do(req)
if err != nil {
fmt.Println(err.Error()) // Fails here
return
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
fmt.Printf("Failure: %d
", resp.StatusCode)
} else {
fmt.Printf("Success: %d
", resp.StatusCode)
}
fmt.Println("Done")
}
My snippet above works for most of the URLs (e.g. "https://i.stack.imgur.com/tKsDb.png"), but it doesn't work if it tries to fetch URLs such as "https://precious.jp/mwimgs/b/1/-/img_b1ec6cf54ff3a4260fb77d3d3de918a5275780.jpg". Error message given by calling err.Error()
is:
Get https://precious.jp/mwimgs/b/1/-/img_b1ec6cf54ff3a4260fb77d3d3de918a5275780.jpg: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
My Go version is "go1.9.3 darwin/amd64", and I can get the image with my Google Chrome and also with curl
command, so I don't think I'm blocked by my IP address. Besides that, I've changed the User-Agent to be like real browser but still not luck.
What's wrong with my code? Or is the administrator of precious.jp
doing some magic to block my access?