I'm stuck in my own wait loop and not really sure why. The function takes an input and output channel, then takes each item in the channel, executes an http.GET for the content and pulls the tag from the html.
The process to GET and scrape is inside a go routine, and I've set up a wait group (innerWait) to be sure that I've processed everything before closing the output channel.
func (fp FeedProducer) getTitles(in <-chan feeds.Item,
out chan<- feeds.Item,
wg *sync.WaitGroup) {
defer wg.Done()
var innerWait sync.WaitGroup
for item := range in {
log.Infof(fp.c, "Incrementing inner WaitGroup.")
innerWait.Add(1)
go func(item feeds.Item) {
defer innerWait.Done()
defer log.Infof(fp.c, "Decriment inner wait group by defer.")
client := urlfetch.Client(fp.c)
resp, err := client.Get(item.Link.Href)
log.Infof(fp.c, "Getting title for: %v", item.Link.Href)
if err != nil {
log.Errorf(fp.c, "Error retriving page. %v", err.Error())
return
}
if strings.ToLower(resp.Header.Get("Content-Type")) == "text/html; charset=utf-8" {
title := fp.scrapeTitle(resp)
item.Title = title
} else {
log.Errorf(fp.c, "Wrong content type. Received: %v from %v", resp.Header.Get("Content-Type"), item.Link.Href)
}
out <- item
}(item)
}
log.Infof(fp.c, "Waiting for title pull wait group.")
innerWait.Wait()
log.Infof(fp.c, "Done waiting for title pull.")
close(out)
}
func (fp FeedProducer) scrapeTitle(request *http.Response) string {
defer request.Body.Close()
tokenizer := html.NewTokenizer(request.Body)
var titleIsNext bool
for {
token := tokenizer.Next()
switch {
case token == html.ErrorToken:
log.Infof(fp.c, "Hit the end of the doc without finding title.")
return ""
case token == html.StartTagToken:
tag := tokenizer.Token()
isTitle := tag.Data == "title"
if isTitle {
titleIsNext = true
}
case titleIsNext && token == html.TextToken:
title := tokenizer.Token().Data
log.Infof(fp.c, "Pulled title: %v", title)
return title
}
}
}
Log content looks like this:
2015/08/09 22:02:10 INFO: Revived query parameter: golang
2015/08/09 22:02:10 INFO: Getting active tweets from the last 7 days.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Incrementing inner WaitGroup.
2015/08/09 22:02:10 INFO: Waiting for title pull wait group.
2015/08/09 22:02:10 INFO: Getting title for: http://devsisters.github.io/goquic/
2015/08/09 22:02:10 INFO: Pulled title: GoQuic by devsisters
2015/08/09 22:02:10 INFO: Getting title for: http://whizdumb.me/2015/03/03/matching-a-string-and-extracting-values-using-regex/
2015/08/09 22:02:10 INFO: Pulled title: Matching a string and extracting values using regex | Whizdumb's blog
2015/08/09 22:02:10 INFO: Getting title for: https://www.reddit.com/r/golang/comments/3g7tyv/dropboxs_infrastructure_is_go_at_a_huge_scale/
2015/08/09 22:02:10 INFO: Pulled title: Dropbox's infrastructure is Go at a huge scale : golang
2015/08/09 22:02:10 INFO: Getting title for: http://dave.cheney.net/2015/08/08/performance-without-the-event-loop
2015/08/09 22:02:10 INFO: Pulled title: Performance without the event loop | Dave Cheney
2015/08/09 22:02:11 INFO: Getting title for: https://github.com/ccirello/sublime-gosnippets
2015/08/09 22:02:11 INFO: Pulled title: ccirello/sublime-gosnippets · GitHub
2015/08/09 22:02:11 INFO: Getting title for: https://medium.com/iron-io-blog/an-easier-way-to-create-tiny-golang-docker-images-7ba2893b160?mkt_tok=3RkMMJWWfF9wsRonuqTMZKXonjHpfsX57ewoWaexlMI/0ER3fOvrPUfGjI4ATsNrI%2BSLDwEYGJlv6SgFQ7LMMaZq1rgMXBk%3D&utm_content=buffer45a1c&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
2015/08/09 22:02:11 INFO: Pulled title: An Easier Way to Create Tiny Golang Docker Images — Iron.io Blog — Medium
I can see that I'm getting to the innerWait.Wait() command based on the logs, which also tells me that the inbound channel has been closed on the other side of the pipe.
It would appear that the defer statements in the anonymous function are not being called, as I can't see the deferred log statement printed anywhere. But I can't for the life of me tell why as all code in that block appears to execute.
Help is appreciated.