http.Client和goroutines的不可预测的结果

I'm new to Golang, trying to build a system that fetches content from a set of urls and extract specific lines with regex. The problems start when i wrap the code with goroutines. I'm getting a different number of regex results and many of fetched lines are duplicates.

max_routines := 3

sem := make(chan int, max_routines) // to control the number of working routines 
var wg sync.WaitGroup
ch_content := make(chan string)

client := http.Client{}

for i:=2; ; i++ { 

    // for testing
    if i>5 {
        break
    }

    // loop should be broken if feebbacks_checstr is found in content
    if loop_break {
        break
    }

    wg.Add(1)
    go func(i int) {

        defer wg.Done()

        sem <- 1 // will block if > max_routines

        final_url = url+a.tm_id+"/page="+strconv.Itoa(i)

        resp, _ := client.Get(final_url)

        var bodyString string 

        if resp.StatusCode == http.StatusOK {
            bodyBytes, _ := ioutil.ReadAll(resp.Body)
            bodyString = string(bodyBytes)
        }

        // checking for stop word in content
        if false == strings.Contains(bodyString, feebbacks_checstr) {

            res2 = regex.FindAllStringSubmatch(bodyString,-1)
            for _,v := range res2 {
                ch_content <- v[1]
            }

        } else {
            loop_break = true
        }

        resp.Body.Close()

        <-sem

    }(i)
}


for {
    select {
        case r := <-ch_content:
            a.feedbacks = append(a.feedbacks, r) // collecting the data 
        case <-time.After(500 * time.Millisecond):
            show(len(a.feedbacks)) // < always different result, many entries in a.feedbacks are duplicates
            fmt.Printf(".")
    }
}

As a result len(a.feedbacks) gives sometimes 130, sometimes 139 and a.feedbacks contains duplicates. If i clean the duplicates the number of results is about half of what i'm expecting (109 without duplicates)

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dtu1747 2018-06-20 19:49
关注
You're creating a closure by using an anonymous go routine function. I notice your final_url isn't := but = which means it's defined outside the closure. All go routines will have access to the same value of final_url and there's a race condition going on. Some go routines are overwriting final_url before other go routines are making their requests and this will result in duplicates.

If you define final_url inside the go routine then they won't be stepping on each other's toes and it should work as you expect.

That's the simple fix for what you have. A more idiomatically Go way to do this would be to create an input channel (containing the URLs to request) and an output channel (eventually containing whatever you're pulling out of the response) and instead of trying to manage the life and death of dozens of go routines you would keep alive a constant amount of go routines that try to empty out the input channel.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

http.Client和goroutines的不可预测的结果
2018-06-19 21:48

回答 1 已采纳 You're creating a closure by using an anonymous go routine function. I notice your final_url isn't
python爬虫中http.client.HTTPSConnection与request的使用 python 有问必答爬虫
2021-12-26 11:25

回答 2 已采纳用它的API啊，这样就省得怕被反爬了 https://docs.opensea.io/reference/api-overview
Could not initialize class org.apache.http.client.fluent.Executor java 有问必答
2022-04-11 18:12

回答 3 已采纳把导入的包名删除比如import xx.xx.xx，试着手动导入包名，看看那两个包存在冲突，进行修改即可。
prometheus安装及使用
2022-01-05 18:02

乘浪初心的博客自2012年成立以来，许多公司和组织都采用了Prometheus，该项目拥有非常活跃的开发人员和用户社区。现在，它是一个独立的开源项目，并且独立于任何公司进行维护。为了强调这一点并阐明项目的治理结构，Prometheus 在...
ModuleNotFoundError: No module named 'paho.mqtt.Client' python
2022-05-28 12:54

回答 1 已采纳第一次报错是表示你导入的模块mqtt里面找不到client，即mqtt.client是找不到的，它提示你是不是client要改为Client可能的改法： import paho.mqtt.client
如何在Go中释放http.Client？
2016-04-18 08:15

回答 1 已采纳 http.Client does not require any special way to free "used" resources. When it becomes unreachable
http.Client在Go编译为wasm的Go中不起作用 http
2019-01-04 16:25

回答 1 已采纳 So it seems like the http call (in wasm) is blocking the event thread in my main js app, hence the
golang大厂面试1
2023-06-11 21:42

theo.wu的博客 //可交易多次，股票交易最大收益 func maxProfitMultiDeal(prices []int) (profit int) { if len(prices) == 0 { return 0 } buy := prices[0] //买入初始值为第一天的股票价格 for i := 1; i (prices)-1; i...
如何在转到IP地址中绑定http.Client http
2015-11-17 22:46

回答 2 已采纳 Similar to this question, you need to set the http.Client.Transport field. Setting it to an instan
使用http.Client和http.Transport设置请求标头
2015-09-12 14:59

回答 1 已采纳 Create a request: req, err := http.NewRequest("GET", "https://www.whatismyip.com/", nil) if err
如何全局设置cookiejar，以便在每个http.Client请求中都存在cookie http
2018-03-27 09:49

回答 2 已采纳 http.Client is designed to be reused: The Client's Transport typically has internal state (cac
Kubernetes 大咖秀徐超 [ 使用 client-go 控制原生及拓展的 Kubernetes API ]
2019-05-01 00:51

li_101357的博客目录如何去访问kubernetes集群 client-go 内容 kubernetes典型的控制器模式 Clientset RESTClient ...偶然在网上看到的，主要是对kubernetes中使用client-go...废话不多说，开始听别人讲吧！如何去访问kube...
如何模拟http.Client Do方法
2017-04-05 20:24

回答 4 已采纳 Any struct with a method matching the signature you have in your interface will implement the inte
5. Go 性能调优之 —— 技巧
2018-09-11 02:14

weixin_33976072的博客 Finalizers 作为 gc 的一部分运行，这意味着它们在运行时是不可预测的，并且它会与减少 gc 时间的目标相悖。当你有一个非常大的堆块，并且已经优化过你的程序使之减少生成垃圾，Finalizers 可能才会很快结束...
Kubernetes 大咖秀徐超《使用 client-go 控制原生及拓展的 Kubernetes API》
2019-09-12 23:38

chengxin5925的博客大家好，我是徐超，从事...今天，我从一个开发者的角度来讲一讲 client-go repository，以及怎么用 client-go 搭建 Controller。同时，也给大家讲一讲开发过程中遇到的坑，希望大家在开发的时候可以绕坑而行。另外...
没有解决我的问题, 去提问

悬赏问题

¥60 求一个简单的网页(标签-安全|关键词-上传)
¥35 lstm时间序列共享单车预测，loss值优化，参数优化算法
¥15 基于卷积神经网络的声纹识别
¥15 Python中的request，如何使用ssr节点，通过代理requests网页。本人在泰国，需要用大陆ip才能玩网页游戏，合法合规。
¥100 为什么这个恒流源电路不能恒流？
¥15 有偿求跨组件数据流路径图
¥15 写一个方法checkPerson，入参实体类Person，出参布尔值
¥15 我想咨询一下路面纹理三维点云数据处理的一些问题，上传的坐标文件里是怎么对无序点进行编号的，以及xy坐标在处理的时候是进行整体模型分片处理的吗
¥15 CSAPPattacklab
¥15 一直显示正在等待HID—ISP

http.Client和goroutines的不可预测的结果

1条回答 默认 最新

悬赏问题

1条回答默认最新