dongtang1918 2015-08-09 22:06
浏览 80
已采纳

Golang卡在WaitGroup中

I'm stuck in my own wait loop and not really sure why. The function takes an input and output channel, then takes each item in the channel, executes an http.GET for the content and pulls the tag from the html.

The process to GET and scrape is inside a go routine, and I've set up a wait group (innerWait) to be sure that I've processed everything before closing the output channel.

   func (fp FeedProducer) getTitles(in <-chan feeds.Item,
    out chan<- feeds.Item,
    wg *sync.WaitGroup) {

    defer wg.Done()

    var innerWait sync.WaitGroup

    for item := range in {
        log.Infof(fp.c, "Incrementing inner WaitGroup.")
        innerWait.Add(1)
        go func(item feeds.Item) {
            defer innerWait.Done()
            defer log.Infof(fp.c, "Decriment inner wait group by defer.")
            client := urlfetch.Client(fp.c)
            resp, err := client.Get(item.Link.Href)
            log.Infof(fp.c, "Getting title for: %v", item.Link.Href)
            if err != nil {
                log.Errorf(fp.c, "Error retriving page. %v", err.Error())
                return
            }
            if strings.ToLower(resp.Header.Get("Content-Type")) == "text/html; charset=utf-8" {
                title := fp.scrapeTitle(resp)
                item.Title = title
            } else {
                log.Errorf(fp.c, "Wrong content type.  Received: %v from %v", resp.Header.Get("Content-Type"), item.Link.Href)
            }
            out <- item
        }(item)
    }
    log.Infof(fp.c, "Waiting for title pull wait group.")
    innerWait.Wait()
    log.Infof(fp.c, "Done waiting for title pull.")
    close(out)
}

func (fp FeedProducer) scrapeTitle(request *http.Response) string {
    defer request.Body.Close()
    tokenizer := html.NewTokenizer(request.Body)
    var titleIsNext bool
    for {
        token := tokenizer.Next()
        switch {
        case token == html.ErrorToken:
            log.Infof(fp.c, "Hit the end of the doc without finding title.")
            return ""
        case token == html.StartTagToken:
            tag := tokenizer.Token()
            isTitle := tag.Data == "title"

            if isTitle {
                titleIsNext = true
            }
        case titleIsNext && token == html.TextToken:
            title := tokenizer.Token().Data
            log.Infof(fp.c, "Pulled title: %v", title)
            return title
        }
    }
}

Log content looks like this:

2015/08/09 22:02:10 INFO: Revived query parameter: golang
2015/08/09 22:02:10 INFO: Getting active tweets from the last 7 days.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Waiting for title pull wait group.
2015/08/09 22:02:10 INFO: Getting title for: http://devsisters.github.io/goquic/
2015/08/09 22:02:10 INFO: Pulled title: GoQuic by devsisters
2015/08/09 22:02:10 INFO: Getting title for: http://whizdumb.me/2015/03/03/matching-a-string-and-extracting-values-using-regex/
2015/08/09 22:02:10 INFO: Pulled title: Matching a string and extracting values using regex | Whizdumb's blog
2015/08/09 22:02:10 INFO: Getting title for: https://www.reddit.com/r/golang/comments/3g7tyv/dropboxs_infrastructure_is_go_at_a_huge_scale/
2015/08/09 22:02:10 INFO: Pulled title: Dropbox's infrastructure is Go at a huge scale : golang
2015/08/09 22:02:10 INFO: Getting title for: http://dave.cheney.net/2015/08/08/performance-without-the-event-loop
2015/08/09 22:02:10 INFO: Pulled title: Performance without the event loop | Dave Cheney
2015/08/09 22:02:11 INFO: Getting title for: https://github.com/ccirello/sublime-gosnippets
2015/08/09 22:02:11 INFO: Pulled title: ccirello/sublime-gosnippets · GitHub
2015/08/09 22:02:11 INFO: Getting title for: https://medium.com/iron-io-blog/an-easier-way-to-create-tiny-golang-docker-images-7ba2893b160?mkt_tok=3RkMMJWWfF9wsRonuqTMZKXonjHpfsX57ewoWaexlMI/0ER3fOvrPUfGjI4ATsNrI%2BSLDwEYGJlv6SgFQ7LMMaZq1rgMXBk%3D&utm_content=buffer45a1c&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
2015/08/09 22:02:11 INFO: Pulled title: An Easier Way to Create Tiny Golang Docker Images — Iron.io Blog — Medium

I can see that I'm getting to the innerWait.Wait() command based on the logs, which also tells me that the inbound channel has been closed on the other side of the pipe.

It would appear that the defer statements in the anonymous function are not being called, as I can't see the deferred log statement printed anywhere. But I can't for the life of me tell why as all code in that block appears to execute.

Help is appreciated.

  • 写回答

1条回答 默认 最新

  • dongluan1743 2015-08-10 00:59
    关注

    The goroutines are stuck sending to out at this line:

            out <- item
    

    The fix is to start a goroutine to receive on out.

    A good way to debug issues like this is to dump the goroutine stacks by sending the process a SIGQUIT.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 c程序不知道为什么得不到结果
  • ¥40 复杂的限制性的商函数处理
  • ¥15 程序不包含适用于入口点的静态Main方法
  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置