dongqufi82315 2017-12-31 13:47
浏览 61


I've been trying to do all the tour of go tutorials and I'm stuck at the web crawler. I thought I finished it, but the output is inconsistent and I don't have enough concurrency experience to figure out why.

Here's my code:

package main

import (

type Fetcher interface {
    // Fetch returns the body of URL and
    // a slice of URLs found on that page.
    Fetch(url string) (body string, urls []string, err error)
var cache = struct {
    fetched map[string]bool
}{fetched: make(map[string]bool)}

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, c chan []string, quit chan int) {
    if depth <= 0 {
    go safeVisit(url, c, quit, fetcher)
    for {
        select {
        case <- quit:
        case u:= <-c:
            for _, v:= range u {
                go Crawl(v, depth -1, fetcher, c, quit)
func main() {
    c := make(chan []string)
    quit := make(chan int)
    Crawl("", 4, fetcher, c, quit)

func safeVisit(url string, c chan []string, quit chan int, fetcher Fetcher) {
    defer cache.Unlock()
    if _, ok := cache.fetched[url] ; ok {
        quit <- 0
    body, urls, err := fetcher.Fetch(url)
    cache.fetched[url] = true
    if err != nil {
    fmt.Printf("Visited : %s, %q 
", url, body)
    c <- urls


// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult

type fakeResult struct {
    body string
    urls []string

func (f fakeFetcher) Fetch(url string) (string, []string, error) {
    if res, ok := f[url]; ok {
        return res.body, res.urls, nil
    return "", nil, fmt.Errorf("not found: %s", url)

// fetcher is a populated fakeFetcher.
var fetcher = fakeFetcher{
    "": &fakeResult{
        "The Go Programming Language",
    "": &fakeResult{
    "": &fakeResult{
        "Package fmt",
    "": &fakeResult{
        "Package os",

Here's some sample output

Visited :, "The Go Programming Language" 
not found:
Visited :, "Packages" 
Visited :, "Package os" 
**Visited :, "Package fmt"** 

Process finished with exit code 0

Different than the first the last package is missing (deliberately in asterisks above)

Visited :, "The Go Programming Language" 
not found:
Visited :, "Packages" 
Visited :, "Package os"

And finally, even a deadlock in some runs:

Visited :, "The Go Programming Language" 
not found:
Visited :, "Packages" 
Visited :, "Package os" 
Visited :, "Package fmt" 
fatal error: all goroutines are asleep - deadlock!

goroutine 1 [select]:
main.Crawl(0x4bfdf9, 0x12, 0x4, 0x524220, 0xc420088120, 0xc420092000, 0xc420092060)
    /home/kostas/development/challenges/go/helloWorld.go:26 +0x201
    /home/kostas/development/challenges/go/helloWorld.go:39 +0xab

goroutine 23 [select]:
main.Crawl(0x4bfdf9, 0x12, 0x3, 0x524220, 0xc420088120, 0xc420092000, 0xc420092060)
    /home/kostas/development/challenges/go/helloWorld.go:26 +0x201
created by main.Crawl
    /home/kostas/development/challenges/go/helloWorld.go:31 +0x123

goroutine 24 [select]:
main.Crawl(0x4c09f9, 0x16, 0x3, 0x524220, 0xc420088120, 0xc420092000, 0xc420092060)
    /home/kostas/development/challenges/go/helloWorld.go:26 +0x201
created by main.Crawl
    /home/kostas/development/challenges/go/helloWorld.go:31 +0x123

goroutine 5 [select]:
main.Crawl(0x4bfdf9, 0x12, 0x3, 0x524220, 0xc420088120, 0xc420092000, 0xc420092060)
    /home/kostas/development/challenges/go/helloWorld.go:26 +0x201
created by main.Crawl
    /home/kostas/development/challenges/go/helloWorld.go:31 +0x123

goroutine 6 [select]:
main.Crawl(0x4c0a0f, 0x16, 0x3, 0x524220, 0xc420088120, 0xc420092000, 0xc420092060)
    /home/kostas/development/challenges/go/helloWorld.go:26 +0x201
created by main.Crawl
    /home/kostas/development/challenges/go/helloWorld.go:31 +0x123

I assume it has something to do with concurrency and recursion. I've seen other solutions in git hub that use waiting group and such, but it's not used at the tutorials - tour of go so far so i'd rather not use it yet.


I figured out what is going on and working on the issue. Basically sometimes the select statement gets stuck in an endless loop because the channels quit and c don't always execute in the expected order. I added a default case that prints("nothing to do") and the program sometimes looped forever, sometimes executed by luck in a correct manner. My exit condition is not right

  • 写回答

1条回答 默认 最新

  • dtdfj08626 2017-12-31 15:10

    I think the case is quite clear. Your channels are messing. Multiple goroutines are recieving from a same channel, and golang just randomly pick one.

    As you send a zero through quit, you never know which goroutine quits: it is randomly picked by the go sheduler. It is possible that a newly generated Crawl recieved from quit before recieving from c (even if both channel are ready).

    And due to that, the depth is a mess and it makes numbers of safeVisit being called unstable, resulting quit issuing different (randomly) signal. Sometimes it is just not enough to quit all goroutines generated, and it is a deadlock.


    First you should understand what your task is. The Crawl function takes in an url, a dep and a fetcher, and it:

    1. Fetch the url
    2. Print the fetched body
    3. Make new Crawl queue generated from the fetched url with dep-1

    Though the tour ask you to "fetch" url in parellel, it is clear that step 2 and step 3 must happen after step 1, meaning it is normal for a single Crawl to wait for the fetch. That means, no need for a new goroutine to call Fetch.

    And on step 3 each new Crawl call has no need to wait the previous to finish, so these calls should be parellel.

    With these analysis, one can come to these code:

    func Crawl(url string, depth int, fetcher Fetcher) {
        // TODO: Fetch URLs in parallel.
        // TODO: Don't fetch the same URL twice.
        // This implementation doesn't do either:
        if depth <= 0 {
        body, urls, err := fetcher.Fetch(url)
        if err != nil {
        fmt.Printf("found: %s %q
    ", url, body)
        for _, u := range urls {
            go Crawl(u, depth-1, fetcher)

    There is one more problem: dealing with a visited url. You have done it well, instead of sending a quit, just make it func(string) bool and call it directly: if Visited(Url) { return } and it is done.

    A side note: the tour is really not good at teaching concurency. You may want to look go blog articles, like golang concurency patterns or share memory by communicating.




  • ¥15 对于这个问题的代码运行
  • ¥50 三种调度算法报错 有实例
  • ¥15 关于#python#的问题,请各位专家解答!
  • ¥200 询问:python实现大地主题正反算的程序设计,有偿
  • ¥15 smptlib使用465端口发送邮件失败
  • ¥200 总是报错,能帮助用python实现程序实现高斯正反算吗?有偿
  • ¥15 对于squad数据集的基于bert模型的微调
  • ¥15 为什么我运行这个网络会出现以下报错?CRNN神经网络
  • ¥20 steam下载游戏占用内存
  • ¥15 CST保存项目时失败