douba05167 2018-06-19 21:48
浏览 40
已采纳

http.Client和goroutines的不可预测的结果

I'm new to Golang, trying to build a system that fetches content from a set of urls and extract specific lines with regex. The problems start when i wrap the code with goroutines. I'm getting a different number of regex results and many of fetched lines are duplicates.

max_routines := 3

sem := make(chan int, max_routines) // to control the number of working routines 
var wg sync.WaitGroup
ch_content := make(chan string)

client := http.Client{}

for i:=2; ; i++ { 

    // for testing
    if i>5 {
        break
    }

    // loop should be broken if feebbacks_checstr is found in content
    if loop_break {
        break
    }

    wg.Add(1)
    go func(i int) {

        defer wg.Done()

        sem <- 1 // will block if > max_routines

        final_url = url+a.tm_id+"/page="+strconv.Itoa(i)

        resp, _ := client.Get(final_url)

        var bodyString string 

        if resp.StatusCode == http.StatusOK {
            bodyBytes, _ := ioutil.ReadAll(resp.Body)
            bodyString = string(bodyBytes)
        }

        // checking for stop word in content
        if false == strings.Contains(bodyString, feebbacks_checstr) {

            res2 = regex.FindAllStringSubmatch(bodyString,-1)
            for _,v := range res2 {
                ch_content <- v[1]
            }

        } else {
            loop_break = true
        }

        resp.Body.Close()

        <-sem

    }(i)
}


for {
    select {
        case r := <-ch_content:
            a.feedbacks = append(a.feedbacks, r) // collecting the data 
        case <-time.After(500 * time.Millisecond):
            show(len(a.feedbacks)) // < always different result, many entries in a.feedbacks are duplicates
            fmt.Printf(".")
    }
}

As a result len(a.feedbacks) gives sometimes 130, sometimes 139 and a.feedbacks contains duplicates. If i clean the duplicates the number of results is about half of what i'm expecting (109 without duplicates)

  • 写回答

1条回答 默认 最新

  • dtu1747 2018-06-20 19:49
    关注

    You're creating a closure by using an anonymous go routine function. I notice your final_url isn't := but = which means it's defined outside the closure. All go routines will have access to the same value of final_url and there's a race condition going on. Some go routines are overwriting final_url before other go routines are making their requests and this will result in duplicates.

    If you define final_url inside the go routine then they won't be stepping on each other's toes and it should work as you expect.

    That's the simple fix for what you have. A more idiomatically Go way to do this would be to create an input channel (containing the URLs to request) and an output channel (eventually containing whatever you're pulling out of the response) and instead of trying to manage the life and death of dozens of go routines you would keep alive a constant amount of go routines that try to empty out the input channel.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP