dongtang1918 2015-08-09 22:06
浏览 80
已采纳

Golang卡在WaitGroup中

I'm stuck in my own wait loop and not really sure why. The function takes an input and output channel, then takes each item in the channel, executes an http.GET for the content and pulls the tag from the html.

The process to GET and scrape is inside a go routine, and I've set up a wait group (innerWait) to be sure that I've processed everything before closing the output channel.

   func (fp FeedProducer) getTitles(in <-chan feeds.Item,
    out chan<- feeds.Item,
    wg *sync.WaitGroup) {

    defer wg.Done()

    var innerWait sync.WaitGroup

    for item := range in {
        log.Infof(fp.c, "Incrementing inner WaitGroup.")
        innerWait.Add(1)
        go func(item feeds.Item) {
            defer innerWait.Done()
            defer log.Infof(fp.c, "Decriment inner wait group by defer.")
            client := urlfetch.Client(fp.c)
            resp, err := client.Get(item.Link.Href)
            log.Infof(fp.c, "Getting title for: %v", item.Link.Href)
            if err != nil {
                log.Errorf(fp.c, "Error retriving page. %v", err.Error())
                return
            }
            if strings.ToLower(resp.Header.Get("Content-Type")) == "text/html; charset=utf-8" {
                title := fp.scrapeTitle(resp)
                item.Title = title
            } else {
                log.Errorf(fp.c, "Wrong content type.  Received: %v from %v", resp.Header.Get("Content-Type"), item.Link.Href)
            }
            out <- item
        }(item)
    }
    log.Infof(fp.c, "Waiting for title pull wait group.")
    innerWait.Wait()
    log.Infof(fp.c, "Done waiting for title pull.")
    close(out)
}

func (fp FeedProducer) scrapeTitle(request *http.Response) string {
    defer request.Body.Close()
    tokenizer := html.NewTokenizer(request.Body)
    var titleIsNext bool
    for {
        token := tokenizer.Next()
        switch {
        case token == html.ErrorToken:
            log.Infof(fp.c, "Hit the end of the doc without finding title.")
            return ""
        case token == html.StartTagToken:
            tag := tokenizer.Token()
            isTitle := tag.Data == "title"

            if isTitle {
                titleIsNext = true
            }
        case titleIsNext && token == html.TextToken:
            title := tokenizer.Token().Data
            log.Infof(fp.c, "Pulled title: %v", title)
            return title
        }
    }
}

Log content looks like this:

2015/08/09 22:02:10 INFO: Revived query parameter: golang
2015/08/09 22:02:10 INFO: Getting active tweets from the last 7 days.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Waiting for title pull wait group.
2015/08/09 22:02:10 INFO: Getting title for: http://devsisters.github.io/goquic/
2015/08/09 22:02:10 INFO: Pulled title: GoQuic by devsisters
2015/08/09 22:02:10 INFO: Getting title for: http://whizdumb.me/2015/03/03/matching-a-string-and-extracting-values-using-regex/
2015/08/09 22:02:10 INFO: Pulled title: Matching a string and extracting values using regex | Whizdumb's blog
2015/08/09 22:02:10 INFO: Getting title for: https://www.reddit.com/r/golang/comments/3g7tyv/dropboxs_infrastructure_is_go_at_a_huge_scale/
2015/08/09 22:02:10 INFO: Pulled title: Dropbox's infrastructure is Go at a huge scale : golang
2015/08/09 22:02:10 INFO: Getting title for: http://dave.cheney.net/2015/08/08/performance-without-the-event-loop
2015/08/09 22:02:10 INFO: Pulled title: Performance without the event loop | Dave Cheney
2015/08/09 22:02:11 INFO: Getting title for: https://github.com/ccirello/sublime-gosnippets
2015/08/09 22:02:11 INFO: Pulled title: ccirello/sublime-gosnippets · GitHub
2015/08/09 22:02:11 INFO: Getting title for: https://medium.com/iron-io-blog/an-easier-way-to-create-tiny-golang-docker-images-7ba2893b160?mkt_tok=3RkMMJWWfF9wsRonuqTMZKXonjHpfsX57ewoWaexlMI/0ER3fOvrPUfGjI4ATsNrI%2BSLDwEYGJlv6SgFQ7LMMaZq1rgMXBk%3D&utm_content=buffer45a1c&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
2015/08/09 22:02:11 INFO: Pulled title: An Easier Way to Create Tiny Golang Docker Images — Iron.io Blog — Medium

I can see that I'm getting to the innerWait.Wait() command based on the logs, which also tells me that the inbound channel has been closed on the other side of the pipe.

It would appear that the defer statements in the anonymous function are not being called, as I can't see the deferred log statement printed anywhere. But I can't for the life of me tell why as all code in that block appears to execute.

Help is appreciated.

  • 写回答

1条回答 默认 最新

  • dongluan1743 2015-08-10 00:59
    关注

    The goroutines are stuck sending to out at this line:

            out <- item
    

    The fix is to start a goroutine to receive on out.

    A good way to debug issues like this is to dump the goroutine stacks by sending the process a SIGQUIT.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 求快手直播间榜单匿名采集ID用户名简单能学会的
  • ¥15 DS18B20内部ADC模数转换器
  • ¥15 做个有关计算的小程序
  • ¥15 MPI读取tif文件无法正常给各进程分配路径
  • ¥15 如何用MATLAB实现以下三个公式(有相互嵌套)
  • ¥30 关于#算法#的问题:运用EViews第九版本进行一系列计量经济学的时间数列数据回归分析预测问题 求各位帮我解答一下
  • ¥15 setInterval 页面闪烁,怎么解决
  • ¥15 如何让企业微信机器人实现消息汇总整合
  • ¥50 关于#ui#的问题:做yolov8的ui界面出现的问题
  • ¥15 如何用Python爬取各高校教师公开的教育和工作经历