dongwupei7803 2019-01-18 21:49
浏览 20

泄漏的goroutine,通常运行的次数是我想要的三倍

I'm trying to make a web scraper, which can run a decent number (many thousands) of http queries per minute. The actual querying is fine but to speed up the process. I'm trying to make it concurrent. Initially I spawned a goroutine for each request but I ran out of file descriptors so after some googling I decided to use a semaphore to limit the number of concurrent goroutines.

Only I can't get this to work.

I've tried moving bits of code around but I always have the same issue: I have roughly three times as many goroutines running as I want

This is the only method I have that spawns goroutines. I limited the goroutines to 80. In my benchmarks I run this against a slice of 10000 URLs and it tends to hover at about 242 concurrent goroutines in flight, but then it suddenly goes up to almost double this and then back down to 242.

I get the same behaviour if I change the concurrent value from 80 - it usually hovers at just over three times the number of goroutines and sometimes spikes to around double that and I have no idea why.

func (B BrandScraper) ScrapeUrls(URLs ...string) []scrapeResponse {
    concurrent := 80
    semaphoreChan := make(chan struct{}, concurrent)
    scrapeResults := make([]scrapeResponse, len(URLs))
    for _, URL := range URLs {
        semaphoreChan <- struct{}{}
        go func(URL string) {
            defer func() {
                <-semaphoreChan
            }()
            scrapeResults = append(scrapeResults,
                B.getIndividualScrape(URL))
            fmt.Printf("#goroutines: %d
", runtime.NumGoroutine())
        }(URL)
    }
    return scrapeResults
}

I'm expecting it to be constantly at 80 goroutines - or at least constant.

This happens when I run it from a benchmarking test or when i run it from the main function.

Thanks very much for any tips!

EDIT

getIndividualScrape

calls another function:

func (B BrandScraper) doGetRequest(URL string) io.Reader {
    resp, err := http.Get(URL)
    if err != nil {
        log.Fatal(err)
    }
    body, _ := ioutil.ReadAll(resp.Body)
    resp.Body.Close()
    return bytes.NewReader(body)
}

which obviously does an HTTP request. Could this be leaking goroutines? I thought since I'd closed the resp.Body I'd have covered that but maybe not?

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 无线电能传输系统MATLAB仿真问题
    • ¥50 如何用脚本实现输入法的热键设置
    • ¥20 我想使用一些网络协议或者部分协议也行,主要想实现类似于traceroute的一定步长内的路由拓扑功能
    • ¥30 深度学习,前后端连接
    • ¥15 孟德尔随机化结果不一致
    • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
    • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
    • ¥15 谁有desed数据集呀
    • ¥20 手写数字识别运行c仿真时,程序报错错误代码sim211-100
    • ¥15 关于#hadoop#的问题