douba05167 2018-06-19 21:48
浏览 40
已采纳

http.Client和goroutines的不可预测的结果

I'm new to Golang, trying to build a system that fetches content from a set of urls and extract specific lines with regex. The problems start when i wrap the code with goroutines. I'm getting a different number of regex results and many of fetched lines are duplicates.

max_routines := 3

sem := make(chan int, max_routines) // to control the number of working routines 
var wg sync.WaitGroup
ch_content := make(chan string)

client := http.Client{}

for i:=2; ; i++ { 

    // for testing
    if i>5 {
        break
    }

    // loop should be broken if feebbacks_checstr is found in content
    if loop_break {
        break
    }

    wg.Add(1)
    go func(i int) {

        defer wg.Done()

        sem <- 1 // will block if > max_routines

        final_url = url+a.tm_id+"/page="+strconv.Itoa(i)

        resp, _ := client.Get(final_url)

        var bodyString string 

        if resp.StatusCode == http.StatusOK {
            bodyBytes, _ := ioutil.ReadAll(resp.Body)
            bodyString = string(bodyBytes)
        }

        // checking for stop word in content
        if false == strings.Contains(bodyString, feebbacks_checstr) {

            res2 = regex.FindAllStringSubmatch(bodyString,-1)
            for _,v := range res2 {
                ch_content <- v[1]
            }

        } else {
            loop_break = true
        }

        resp.Body.Close()

        <-sem

    }(i)
}


for {
    select {
        case r := <-ch_content:
            a.feedbacks = append(a.feedbacks, r) // collecting the data 
        case <-time.After(500 * time.Millisecond):
            show(len(a.feedbacks)) // < always different result, many entries in a.feedbacks are duplicates
            fmt.Printf(".")
    }
}

As a result len(a.feedbacks) gives sometimes 130, sometimes 139 and a.feedbacks contains duplicates. If i clean the duplicates the number of results is about half of what i'm expecting (109 without duplicates)

  • 写回答

1条回答 默认 最新

  • dtu1747 2018-06-20 19:49
    关注

    You're creating a closure by using an anonymous go routine function. I notice your final_url isn't := but = which means it's defined outside the closure. All go routines will have access to the same value of final_url and there's a race condition going on. Some go routines are overwriting final_url before other go routines are making their requests and this will result in duplicates.

    If you define final_url inside the go routine then they won't be stepping on each other's toes and it should work as you expect.

    That's the simple fix for what you have. A more idiomatically Go way to do this would be to create an input channel (containing the URLs to request) and an output channel (eventually containing whatever you're pulling out of the response) and instead of trying to manage the life and death of dozens of go routines you would keep alive a constant amount of go routines that try to empty out the input channel.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 如何处理复杂数据表格的除法运算
  • ¥15 如何用stc8h1k08的片子做485数据透传的功能?(关键词-串口)
  • ¥15 有兄弟姐妹会用word插图功能制作类似citespace的图片吗?
  • ¥200 uniapp长期运行卡死问题解决
  • ¥15 请教:如何用postman调用本地虚拟机区块链接上的合约?
  • ¥15 为什么使用javacv转封装rtsp为rtmp时出现如下问题:[h264 @ 000000004faf7500]no frame?
  • ¥15 乘性高斯噪声在深度学习网络中的应用
  • ¥15 关于docker部署flink集成hadoop的yarn,请教个问题 flink启动yarn-session.sh连不上hadoop,这个整了好几天一直不行,求帮忙看一下怎么解决
  • ¥15 深度学习根据CNN网络模型,搭建BP模型并训练MNIST数据集
  • ¥15 C++ 头文件/宏冲突问题解决