duanqie5741 2012-11-04 09:49
浏览 30
已采纳

去旅游爬虫运动麻烦

I'm going through the go tour and I feel like I have a pretty good understanding of the language except for concurrency.

On slide 72 there is an exercise that asks the reader to parallelize a web crawler (and to make it not cover repeats but I haven't gotten there yet.)

Here is what I have so far:

func Crawl(url string, depth int, fetcher Fetcher, ch chan string) {
    if depth <= 0 {
        return
    }

    body, urls, err := fetcher.Fetch(url)
    if err != nil {
        ch <- fmt.Sprintln(err)
        return
    }

    ch <- fmt.Sprintf("found: %s %q
", url, body)
    for _, u := range urls {
        go Crawl(u, depth-1, fetcher, ch)
    }
}

func main() {
    ch := make(chan string, 100)
    go Crawl("http://golang.org/", 4, fetcher, ch)

    for i := range ch {
        fmt.Println(i)
    }
}

The issue I have is where to put the close(ch) call. If I put a defer close(ch) somewhere in the Crawl method, then I end up writing to a closed channel in one of the spawned goroutines, since the method will finish execution before the spawned goroutines do.

If I omit the call to close(ch), as is shown in my example code, the program deadlocks after all the goroutines finish executing but the main thread is still waiting on the channel in the for loop since the channel was never closed.

  • 写回答

11条回答 默认 最新

  • doujiku1028 2012-11-04 22:57
    关注

    A look at the Parallelization section of Effective Go leads to ideas for the solution. Essentually you have to close the channel on each return route of the function. Actually this is a nice use case of the defer statement:

    func Crawl(url string, depth int, fetcher Fetcher, ret chan string) {
        defer close(ret)
        if depth <= 0 {
            return
        }
    
        body, urls, err := fetcher.Fetch(url)
        if err != nil {
            ret <- err.Error()
            return
        }
    
        ret <- fmt.Sprintf("found: %s %q", url, body)
    
        result := make([]chan string, len(urls))
        for i, u := range urls {
            result[i] = make(chan string)
            go Crawl(u, depth-1, fetcher, result[i])
        }
    
        for i := range result {
            for s := range result[i] {
                ret <- s
            }
        }
    
        return
    }
    
    func main() {
        result := make(chan string)
        go Crawl("http://golang.org/", 4, fetcher, result)
    
        for s := range result {
            fmt.Println(s)
        }
    }
    

    The essential difference to your code is that every instance of Crawl gets its own return channel and the caller function collects the results in its return channel.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(10条)

报告相同问题?

悬赏问题

  • ¥15 用visual studi code完成html页面
  • ¥15 聚类分析或者python进行数据分析
  • ¥15 逻辑谓词和消解原理的运用
  • ¥15 三菱伺服电机按启动按钮有使能但不动作
  • ¥15 js,页面2返回页面1时定位进入的设备
  • ¥50 导入文件到网吧的电脑并且在重启之后不会被恢复
  • ¥15 (希望可以解决问题)ma和mb文件无法正常打开,打开后是空白,但是有正常内存占用,但可以在打开Maya应用程序后打开场景ma和mb格式。
  • ¥20 ML307A在使用AT命令连接EMQX平台的MQTT时被拒绝
  • ¥20 腾讯企业邮箱邮件可以恢复么
  • ¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗?