douqin6785 2018-01-30 08:57
浏览 63

尝试使用“ net / http”获取某些图像时,为什么会收到“ net / http:等待连接时请求已取消”的信息

I'm writing a web crawler in Go language to collect images on the Internet. My crawler works most of the time, but it sometimes fails to fetch images somehow.

Here's my snippet:

package main

import (
    "fmt"
    "net/http"
    "time"
)

func main() {
    var client http.Client
    var resp *http.Response

    // var imageUrl = "https://i.stack.imgur.com/tKsDb.png"  // It works well
    var imageUrl = "https://precious.jp/mwimgs/b/1/-/img_b1ec6cf54ff3a4260fb77d3d3de918a5275780.jpg"  // It fails

    req, _ := http.NewRequest("GET", imageUrl, nil)
    req.Header.Add("User-Agent", "My Test")

    client.Timeout = 3 * time.Second
    resp, err := client.Do(req)
    if err != nil {
        fmt.Println(err.Error())  // Fails here
        return
    }
    defer resp.Body.Close()

    if resp.StatusCode != http.StatusOK {
        fmt.Printf("Failure: %d
", resp.StatusCode)
    } else {
        fmt.Printf("Success: %d
", resp.StatusCode)
    }

    fmt.Println("Done")
}

My snippet above works for most of the URLs (e.g. "https://i.stack.imgur.com/tKsDb.png"), but it doesn't work if it tries to fetch URLs such as "https://precious.jp/mwimgs/b/1/-/img_b1ec6cf54ff3a4260fb77d3d3de918a5275780.jpg". Error message given by calling err.Error() is:

Get https://precious.jp/mwimgs/b/1/-/img_b1ec6cf54ff3a4260fb77d3d3de918a5275780.jpg: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"

My Go version is "go1.9.3 darwin/amd64", and I can get the image with my Google Chrome and also with curl command, so I don't think I'm blocked by my IP address. Besides that, I've changed the User-Agent to be like real browser but still not luck.

What's wrong with my code? Or is the administrator of precious.jp doing some magic to block my access?

  • 写回答

1条回答 默认 最新

  • dongzheng7165 2018-01-30 13:57
    关注

    Since you're using https, you need to create http.Client with custom transport and configure TLS (see http.Transport), e.g.

    package main
    
    import (
        "crypto/tls"
        "fmt"
        "net/http"
        "time"
    )
    
    func main() {
        //---------------------- Modification ----------------------
        //Configure TLS, etc.
        tr := &http.Transport{
            TLSClientConfig: &tls.Config{
                InsecureSkipVerify: true,
            },
        }
        client := &http.Client{
            Transport: tr,
            Timeout:   3 * time.Second,
        }
        //---------------------- End of Modification ----------------
    
        // var imageUrl = "https://i.stack.imgur.com/tKsDb.png"  // It works well
        var imageUrl = "https://precious.jp/mwimgs/b/1/-/img_b1ec6cf54ff3a4260fb77d3d3de918a5275780.jpg" // It fails
    
        req, _ := http.NewRequest("GET", imageUrl, nil)
        req.Header.Add("User-Agent", "My Test")
    
        resp, err := client.Do(req)
        if err != nil {
            fmt.Println(err.Error()) // Fails here
            return
        }
        defer resp.Body.Close()
    
        if resp.StatusCode != http.StatusOK {
            fmt.Printf("Failure: %d
    ", resp.StatusCode)
        } else {
            fmt.Printf("Success: %d
    ", resp.StatusCode)
        }
    
        fmt.Println("Done")
    }
    
    评论

报告相同问题?

悬赏问题

  • ¥20 测距传感器数据手册i2c
  • ¥15 RPA正常跑,cmd输入cookies跑不出来
  • ¥15 求帮我调试一下freefem代码
  • ¥15 matlab代码解决,怎么运行
  • ¥15 R语言Rstudio突然无法启动
  • ¥15 关于#matlab#的问题:提取2个图像的变量作为另外一个图像像元的移动量,计算新的位置创建新的图像并提取第二个图像的变量到新的图像
  • ¥15 改算法,照着压缩包里边,参考其他代码封装的格式 写到main函数里
  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法