柯利找不到任何链接

I've done a few programs like this before in basically the same fashion (just different domains), however this time, colly isn't finding a single link and just quits after visiting the first page. Can anyone see what's wrong? *NOTE: there are parts of the program I have omitted for clarity about the topic at hand.

*EDIT: I have found the problem but not a solution. Running curl https://trendmicro.com/vinfo/us/security/research-and-analysis/threat-reports returns a 301 permanently moved error in the terminal, but connecting to the same link in the browser gets the page I want. Why is THIS happening and how do I fix it?

*EDIT2: I have found that making the command curl -L makes curl follow redirects - which then spits out the webpage I need. However, how do I translate that to colly? Because colly is still picking up the 301 error.

import (
    "fmt"
    "strings"
    "github.com/gocolly/colly"
)

func main() {
    /* only navigate to links within these paths */
    tld1 := "/vinfo/us/security/research-and-analysis/threat-reports"

    c := colly.NewCollector(
        colly.AllowedDomains("trendmicro.com", "documents.trendmicro.com"),
    )

    c.OnHTML("a[href]", func(e *colly.HTMLElement) {
        link := e.Attr("href")
        fmt.Printf("Link found: %q -> %s
", e.Text, link)
        if strings.Contains(link, tld1) {
            c.Visit(e.Request.AbsoluteURL(link))
        }
    })

    c.OnRequest(func(r * colly.Request) {
        fmt.Println("Visiting", r.URL.String())
    })

    c.Visit("https://trendmicro.com/vinfo/us/security/research-and-analysis/threat-reports")
}

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douxiawei9318 2019-02-14 18:39
关注
I have found the solution. I plugged my link https://trendmicro.com/vinfo/us/security/research-and-analysis/threat-reports into https://wheregoes.com/retracer.php to find where the 301 redirects to, only to find out it prepends a www. to the beginning of the link. Adding the www. to the beginning of the initial c.Visit string and to the c.AllowedDomains sections worked like a charm

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

使用HTML制作静态宠物网站——蓝色版爱宠之家(HTML+CSS)
2022-12-27 15:06

HTML网页设计的博客所有页面相互超链接，可到三级页面，有5-10个页面组成。页面样式风格统一布局显示正常，不错乱，使用Div+Css技术。菜单美观、醒目，二级菜单可正常弹出与跳转。要有JS特效，如定时切换和手动切换图片轮播。页面...
web期末大作业使用HTML+CSS制作蓝色版爱宠之家带留言板(5页)
2022-12-19 10:10

STU网页设计-web前端优质创作者的博客所有页面相互超链接，可到三级页面，有5-10个页面组成。页面样式风格统一布局显示正常，不错乱，使用Div+Css技术。菜单美观、醒目，二级菜单可正常弹出与跳转。要有JS特效，如定时切换和手动切换图片轮播。页面...
没有解决我的问题, 去提问

悬赏问题

¥15 使用EMD去噪处理RML2016数据集时候的原理
¥15 神经网络预测均方误差很小但是图像上看着差别太大
¥15 Oracle中如何从clob类型截取特定字符串后面的字符
¥15 想通过pywinauto自动电机应用程序按钮，但是找不到应用程序按钮信息
¥15 如何在炒股软件中，爬到我想看的日k线
¥15 seatunnel 怎么配置Elasticsearch
¥15 PSCAD安装问题 ERROR: Visual Studio 2013, 2015, 2017 or 2019 is not found in the system.
¥15 (标签-MATLAB|关键词-多址)
¥15 关于#MATLAB#的问题，如何解决？（相关搜索：信噪比，系统容量）
¥500 52810做蓝牙接受端

柯利找不到任何链接

1条回答 默认 最新

悬赏问题

1条回答默认最新