douguluan5102 2017-09-24 03:01
浏览 4
已采纳

goroutine并未在“ A Go of Go”的“抓取”示例中生效

As the hits mentioned in Crawl example of 'A Tour of Go', I modified the Crawl function and just wonder why the 'go Crawl' failed to spawn another thread as only one url was found printed out.

Is there anything wrong with my modification?

List my modification as below,

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {
    // TODO: Fetch URLs in parallel.
    // TODO: Don't fetch the same URL twice.
    // This implementation doesn't do either:
    if depth <= 0 {
        fmt.Printf("depth <= 0 return")
        return
    }
    body, urls, err := fetcher.Fetch(url)
    if err != nil {
        fmt.Println(err)
        return
    }
    fmt.Printf("found: %s %q
", url, body)
    crawled.mux.Lock()
    crawled.c[url]++
    crawled.mux.Unlock()
    for _, u := range urls {
        //crawled.mux.Lock()
        if cnt, ok := crawled.c[u]; ok {
            cnt++
        } else {
            fmt.Println("go ...", u)
            go Crawl(u, depth-1, fetcher)
        }
        //crawled.mux.Unlock()
        //Crawl(u, depth-1, fetcher)
    }
    return
}


type crawledUrl struct {
    c   map[string]int
    mux sync.Mutex
}

var crawled = crawledUrl{c: make(map[string]int)}
  • 写回答

1条回答 默认 最新

  • douzhong1730 2017-09-24 10:22
    关注

    In your program, you have no any synchronized tool for your go routines.

    So the behavior of this code is undefined. Perhaps main go thread will end soon.

    Please remember that the main go routine will never block to wait other go routines for termination, only if you explicitly use some kind of util to synchronize the execution of go routines.

    Such as channels or useful sync utils.

    Let me help to give a version.

    type fetchState struct {
        mu      sync.Mutex
        fetched map[string]bool
    }
    
    func (f *fetchState) CheckAndMark(url string) bool {
        defer f.mu.Unlock()
    
        f.mu.Lock()
        if f.fetched[url] {
            return true
        }
        f.fetched[url] = true
        return false
    }
    
    func mkFetchState() *fetchState {
        f := &fetchState{}
        f.fetched = make(map[string]bool)
        return f
    }
    
    func CrawlConcurrentMutex(url string, fetcher Fetcher, f *fetchState) {
        if f.CheckAndMark(url) {
            return
        }
    
        body, urls, err := fetcher.Fetch(url)
        if err != nil {
            fmt.Println(err)
            return
        }
        fmt.Printf("found: %s %q
    ", url, body)
        var done sync.WaitGroup
        for _, u := range urls {
            done.Add(1)
            go func(u string) {
                defer done.Done()
                CrawlConcurrentMutex(u, fetcher, f)
            }(u) // Without the u argument there is a race
        }
        done.Wait()
        return
    }
    

    Please pay attention to the usage of sync.WaitGroup, please refer the doc and you can understand the whole story.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 plotBAPC画图出错
  • ¥30 关于#opencv#的问题:使用大疆无人机拍摄水稻田间图像,拼接成tif图片,用什么方法可以识别并框选出水稻作物行
  • ¥15 Python卡尔曼滤波融合
  • ¥20 iOS绕地区网络检测
  • ¥15 python验证码滑块图像识别
  • ¥15 根据背景及设计要求撰写设计报告
  • ¥20 能提供一下思路或者代码吗
  • ¥15 用twincat控制!
  • ¥15 请问一下这个运行结果是怎么来的
  • ¥15 单通道放大电路的工作原理