goroutine并未在“ A Go of Go”的“抓取”示例中生效

As the hits mentioned in Crawl example of 'A Tour of Go', I modified the Crawl function and just wonder why the 'go Crawl' failed to spawn another thread as only one url was found printed out.

Is there anything wrong with my modification?

List my modification as below,

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {
    // TODO: Fetch URLs in parallel.
    // TODO: Don't fetch the same URL twice.
    // This implementation doesn't do either:
    if depth <= 0 {
        fmt.Printf("depth <= 0 return")
        return
    }
    body, urls, err := fetcher.Fetch(url)
    if err != nil {
        fmt.Println(err)
        return
    }
    fmt.Printf("found: %s %q
", url, body)
    crawled.mux.Lock()
    crawled.c[url]++
    crawled.mux.Unlock()
    for _, u := range urls {
        //crawled.mux.Lock()
        if cnt, ok := crawled.c[u]; ok {
            cnt++
        } else {
            fmt.Println("go ...", u)
            go Crawl(u, depth-1, fetcher)
        }
        //crawled.mux.Unlock()
        //Crawl(u, depth-1, fetcher)
    }
    return
}


type crawledUrl struct {
    c   map[string]int
    mux sync.Mutex
}

var crawled = crawledUrl{c: make(map[string]int)}

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douzhong1730 2017-09-24 10:22
关注
In your program, you have no any synchronized tool for your go routines.

So the behavior of this code is undefined. Perhaps main go thread will end soon.

Please remember that the main go routine will never block to wait other go routines for termination, only if you explicitly use some kind of util to synchronize the execution of go routines.

Such as channels or useful sync utils.

Let me help to give a version.

type fetchState struct { mu sync.Mutex fetched map[string]bool } func (f *fetchState) CheckAndMark(url string) bool { defer f.mu.Unlock() f.mu.Lock() if f.fetched[url] { return true } f.fetched[url] = true return false } func mkFetchState() *fetchState { f := &fetchState{} f.fetched = make(map[string]bool) return f } func CrawlConcurrentMutex(url string, fetcher Fetcher, f *fetchState) { if f.CheckAndMark(url) { return } body, urls, err := fetcher.Fetch(url) if err != nil { fmt.Println(err) return } fmt.Printf("found: %s %q ", url, body) var done sync.WaitGroup for _, u := range urls { done.Add(1) go func(u string) { defer done.Done() CrawlConcurrentMutex(u, fetcher, f) }(u) // Without the u argument there is a race } done.Wait() return }

Please pay attention to the usage of sync.WaitGroup, please refer the doc and you can understand the whole story.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

使用gocolly抓取时如何在html表格单元格中保留换行符
2018-09-07 06:05

回答 2 已采纳 As far as I know gocolly does not support such formatting, but you can basically do something like
使用PHP抓取word文档并将其存储在数据库中[关闭] php
2016-11-24 10:56

回答 1 已采纳 To do that you should have a look at PHPWord to access the functionality you are looking for.
GoLang刮板机。如何在网站上抓取动态生成的链接？
2017-08-27 13:39

回答 3 已采纳 If the tag is not in the source, then GoQuery will not work. GoQuery is for parsing HTML source us
Golang 基础与进阶知识点
2024-05-18 14:05

Lisongxi的博客 Go 语言的 GPM 调度模型是 Go 运行时特有的并发调度模型，用于管理和调度 Goroutines（Go 语言的轻量级线程）。GPM 模型由三部分组成：Goroutine（G）、M（Machine）、和 P（Processor）。实战参考G: 表示 Goroutine...
如何使用golang抓取h1标签的标题？
2017-01-05 16:49

回答 1 已采纳 What you got is the StartTagToken, the part you're intrested in is between it and the correspondin
如何查询以查找字符串中的特定文本并在其旁边抓取部分？ database mysql php
2013-11-25 23:16

回答 1 已采纳 In situations like this function SUBSTRING_INDEX() becomes very handy SELECT SUBSTRING_INDEX(SUB
抓取reddit评论并使用php将它们存档在数据库中 json php
2012-11-04 07:34

回答 1 已采纳 Here's a starting point: $download=json_decode(file_get_contents('http://www.reddit.com/r/blog/co
Go 和安全（一）
2024-07-11 10:55

绝不原创的飞龙的博客本书涵盖了 Go 编程语言，并解释了如何将其应用于网络安全行业。所涵盖的主题对于红队和蓝队都很有用，也适用于希望编写安全代码的开发人员，以及希望保护其网络、主机和知识产权的网络和运维工程师。源代码示例都是...
Docker容器内golang实现抓取eth交易数据到mysql,遇到问题,求助,附图 centos docker golang mysql
2019-04-22 12:23

回答 1 已采纳根据您提供的信息，很难确定问题的具体原因。有可能是Docker容器中的某些配置或库文件配置错误，也有可能是golang代码实现方面的问题。建议您可以尝试以下几个步骤来排除问题：查看Docker容器日
从Twitter库搜索中获取数据到Go中的结构 twitter
2019-04-11 19:09

回答 1 已采纳 It doesn't look like the client is just returning you a slice of strings. The range syntax you're
vivado怎么用来抓取仿真信号并给matlab处理？ fpga开发 matlab 开发语言
2022-04-07 11:00

回答 1 已采纳 verilog 有对文件读写的函数，可以将仿真波形中需要的数据写入文件。
Go 云原生编程（四）
2024-07-12 10:16

绝不原创的飞龙的博客例如，在包中，您可以添加一个新文件——让我们称之为metrics.go},Prometheus 客户端库在一个包中跟踪所有创建的指标对象，这是一个全局注册表，会自动初始化。通过调用函数，您可以将新的指标添加到此注册表中。当 ...
golang大厂面试1
2023-06-11 21:42

theo.wu的博客 Golang字节面试经验分享第一面面试官首先介绍说会有几轮面试算法题 1.1将整数转换二进制然后将负数变成。
golang学习笔记（基础篇）
2023-01-13 23:31

lcy～的博客 Golang学习笔记_从零开始
想入职网络安全？小白必看面试题（全网最详细版）（二）
2024-09-26 19:03

Dest1ny-安全的博客排查方法：可以通过Go的pprof工具进行性能分析，检查未正常退出的goroutine，同时定期关闭通道和任务。 10. SQL注入中时间盲注的POC应该如何编写？ POC示例： sql 复制代码 SELECT IF(SUBSTRING(database(),1,1)='a...
没有解决我的问题, 去提问

悬赏问题

¥15 plotBAPC画图出错
¥30 关于#opencv#的问题：使用大疆无人机拍摄水稻田间图像，拼接成tif图片，用什么方法可以识别并框选出水稻作物行
¥15 Python卡尔曼滤波融合
¥20 iOS绕地区网络检测
¥15 python验证码滑块图像识别
¥15 根据背景及设计要求撰写设计报告
¥20 能提供一下思路或者代码吗
¥15 用twincat控制！
¥15 请问一下这个运行结果是怎么来的
¥15 单通道放大电路的工作原理

goroutine并未在“ A Go of Go”的“抓取”示例中生效

1条回答 默认 最新

悬赏问题

1条回答默认最新