去旅游爬虫运动麻烦

I'm going through the go tour and I feel like I have a pretty good understanding of the language except for concurrency.

On slide 72 there is an exercise that asks the reader to parallelize a web crawler (and to make it not cover repeats but I haven't gotten there yet.)

Here is what I have so far:

func Crawl(url string, depth int, fetcher Fetcher, ch chan string) {
    if depth <= 0 {
        return
    }

    body, urls, err := fetcher.Fetch(url)
    if err != nil {
        ch <- fmt.Sprintln(err)
        return
    }

    ch <- fmt.Sprintf("found: %s %q
", url, body)
    for _, u := range urls {
        go Crawl(u, depth-1, fetcher, ch)
    }
}

func main() {
    ch := make(chan string, 100)
    go Crawl("http://golang.org/", 4, fetcher, ch)

    for i := range ch {
        fmt.Println(i)
    }
}

The issue I have is where to put the close(ch) call. If I put a defer close(ch) somewhere in the Crawl method, then I end up writing to a closed channel in one of the spawned goroutines, since the method will finish execution before the spawned goroutines do.

If I omit the call to close(ch), as is shown in my example code, the program deadlocks after all the goroutines finish executing but the main thread is still waiting on the channel in the for loop since the channel was never closed.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

11条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doujiku1028 2012-11-04 22:57
关注
A look at the Parallelization section of Effective Go leads to ideas for the solution. Essentually you have to close the channel on each return route of the function. Actually this is a nice use case of the defer statement:

func Crawl(url string, depth int, fetcher Fetcher, ret chan string) { defer close(ret) if depth <= 0 { return } body, urls, err := fetcher.Fetch(url) if err != nil { ret <- err.Error() return } ret <- fmt.Sprintf("found: %s %q", url, body) result := make([]chan string, len(urls)) for i, u := range urls { result[i] = make(chan string) go Crawl(u, depth-1, fetcher, result[i]) } for i := range result { for s := range result[i] { ret <- s } } return } func main() { result := make(chan string) go Crawl("http://golang.org/", 4, fetcher, result) for s := range result { fmt.Println(s) } }

The essential difference to your code is that every instance of Crawl gets its own return channel and the caller function collects the results in its return channel.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(10条)

报告相同问题？

关注问题

去旅游爬虫运动麻烦
2012-11-04 09:49

回答 11 已采纳 A look at the Parallelization section of Effective Go leads to ideas for the solution. Essentually
python爬虫去哪网热门景点 python 爬虫
2018-06-22 07:53

回答 1 已采纳用fiddler抓包看下，要么是第三页的地址或者参数没有对，要么是服务器有反爬虫的机制（比如频繁访问，返回错误页面、验证码）。
python爬虫问题求解 python 爬虫
2022-04-29 11:12

回答 1 已采纳我给你改了一下，你对比看看吧： from bs4 import BeautifulSoup import pandas as pd import requests def crawer_travel
大众点评svg反爬
2020-11-07 17:16

esword is me的博客从网站内可以推荐吃喝玩乐优惠信息，提供美食餐厅、酒店旅游、电影票、家居装修、美容美发、运动健身等各类生活服务，通过海量真实消费评论的聚合，帮助大家选到服务满意商家。因此，该网站的数据也就非常有价值。...
爬虫json报错解决方法 python 爬虫
2022-12-09 10:07

回答 1 已采纳 worldDataStr不是标准的json格式，所以用json.loads 会报错
爬虫selenium打开Chrome浏览器闪退 python 爬虫
2022-11-09 18:59

回答 3 已采纳
python爬虫数据显示问题 python 爬虫
2022-07-20 16:54

回答 3 已采纳首先，你这里写错了divs = query(".cm-content-box").items()
[毕业设计]最新最全计算机专业毕业设计选题推荐精选汇总
2022-12-12 14:50

HaiLang_IT的博客基于springboot的抗疫应急物资管理平台基于springboot的旅游信息管理景点门票景区酒店预订系统基于springboot的冕宁灵山寺庙景点系统基于springboot的农产品商城基于springboot的拍卖系统基于springboot的烹饪...
爬虫遇到跨域应该怎么解决爬虫
2021-10-04 18:57

回答 1 已采纳在浏览器上安装允许跨域发送请求的插件 Allow CORS: Access-Control-Allow-Origin – 下载 🦊 Firefox 扩展（zh-CN）
Jupyter Notebook 网站爬虫 jupyter python 爬虫
2022-09-02 13:42

回答 2 已采纳比较笨的办法 import calendar import re import time import openpyxl import parsel as parsel from selenium
python爬虫selenium点击按钮 python selenium 爬虫
2022-10-21 12:35

回答 2 已采纳可以看下xpath语法，还有个插件（xPath Finder）在firefox浏览器里可以一键定位到元素并复制xpath路径，如果插件给出的xpath路径定位不到，可以尝试自己写相对路径
数据科学与大数据技术专业毕业设计选题
2022-10-30 23:36

HaiLang_IT的博客大数据下的精准营销策略研究基于python的文本情感分析算法设计基于大数据对京东用户的购买意向分析 XX旅游管理系统设计与实现基于“58同城招聘网”就业形式的分析与研究基于“去哪儿网”云南省旅游景点的分析与...
爬虫工程师的工作流程 python selenium 爬虫
2022-08-15 14:32

回答 2 已采纳一般来讲就是爬虫工程师只对数据进行一个简单的清洗和过滤，解析出我们需要的所有字段，并做去重处理即可后续对数据进行加工和价值挖掘有专业人员入手，比如算法工程师，数仓工程师等等术业有专攻，爬虫工程师的工作
[毕业设计]2023-2024年最新最全计算机专业毕设选题推荐汇总
2023-09-21 21:37

源码之家的博客旅游推荐系统+爬虫+文档 Django框架、requests爬虫、基于用户协同过滤推荐算法、去哪儿旅游网电商销售数据分析+算法预测+可视化大屏系统 Flask框架、Selenium爬虫、Echarts可视化、多元线性回归预测模型-scikit-...
深度学习、机器学习方向计算机毕业设计题目大全（算法应用实践类）
2020-04-29 13:38

就是求关注的博客讲道理有些题目，比如“用户评分的隐式成分信息的研究”这种题目取的就比较广，有点科学研究的味道，如果真的去做，还是比较有技术含量的。因为其下一步的应用是具有广阔前景的。还有部分项目可能需要大量的数据集，...
没有解决我的问题, 去提问

悬赏问题

¥15 用visual studi code完成html页面
¥15 聚类分析或者python进行数据分析
¥15 逻辑谓词和消解原理的运用
¥15 三菱伺服电机按启动按钮有使能但不动作
¥15 js，页面2返回页面1时定位进入的设备
¥50 导入文件到网吧的电脑并且在重启之后不会被恢复
¥15 （希望可以解决问题）ma和mb文件无法正常打开，打开后是空白，但是有正常内存占用，但可以在打开Maya应用程序后打开场景ma和mb格式。
¥20 ML307A在使用AT命令连接EMQX平台的MQTT时被拒绝
¥20 腾讯企业邮箱邮件可以恢复么
¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗？

去旅游爬虫运动麻烦

11条回答 默认 最新

悬赏问题

11条回答默认最新