drtppp75155 2012-09-01 04:46
浏览 45
已采纳

练习:Web爬网程序-并发不起作用

I am going through the golang tour and working on the final exercise to change a web crawler to crawl in parallel and not repeat a crawl ( http://tour.golang.org/#73 ). All I have changed is the crawl function.

    var used = make(map[string]bool)

    func Crawl(url string, depth int, fetcher Fetcher) {
        if depth <= 0 {
            return
        }
        body, urls, err := fetcher.Fetch(url)
        if err != nil {
            fmt.Println(err)
            return
        }
        fmt.Printf("
found: %s %q

", url, body)
        for _,u := range urls {
            if used[u] == false {
                used[u] = true
                Crawl(u, depth-1, fetcher)
            }
        }
        return
    }

In order to make it concurrent I added the go command in front of the call to the function Crawl, but instead of recursively calling the Crawl function the program only finds the "http://golang.org/" page and no other pages.

Why doesn't the program work when I add the go command to the call of the function Crawl?

  • 写回答

2条回答 默认 最新

  • dpbz14739 2012-09-03 15:09
    关注

    The problem seems to be, that your process is exiting before all URLs can be followed by the crawler. Because of the concurrency, the main() procedure is exiting before the workers are finished.

    To circumvent this, you could use sync.WaitGroup:

    func Crawl(url string, depth int, fetcher Fetcher, wg *sync.WaitGroup) {
        defer wg.Done()
        if depth <= 0 {
             return
        }
        body, urls, err := fetcher.Fetch(url)
        if err != nil {
            fmt.Println(err)
            return
        }
        fmt.Printf("
    found: %s %q
    
    ", url, body)
        for _,u := range urls {
            if used[u] == false {
               used[u] = true
               wg.Add(1)
               go Crawl(u, depth-1, fetcher, wg)
            }
        }
        return
    }
    

    And call Crawl in main as follows:

    func main() {
        wg := &sync.WaitGroup{}
    
        Crawl("http://golang.org/", 4, fetcher, wg)
    
        wg.Wait()
    }
    

    Also, don't rely on the map being thread safe.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 微信公众号自制会员卡没有收款渠道啊
  • ¥15 stable diffusion
  • ¥100 Jenkins自动化部署—悬赏100元
  • ¥15 关于#python#的问题:求帮写python代码
  • ¥20 MATLAB画图图形出现上下震荡的线条
  • ¥15 关于#windows#的问题:怎么用WIN 11系统的电脑 克隆WIN NT3.51-4.0系统的硬盘
  • ¥15 perl MISA分析p3_in脚本出错
  • ¥15 k8s部署jupyterlab,jupyterlab保存不了文件
  • ¥15 ubuntu虚拟机打包apk错误
  • ¥199 rust编程架构设计的方案 有偿