I've done a few programs like this before in basically the same fashion (just different domains), however this time, colly isn't finding a single link and just quits after visiting the first page. Can anyone see what's wrong? *NOTE: there are parts of the program I have omitted for clarity about the topic at hand.
*EDIT: I have found the problem but not a solution. Running curl https://trendmicro.com/vinfo/us/security/research-and-analysis/threat-reports
returns a 301 permanently moved error in the terminal, but connecting to the same link in the browser gets the page I want. Why is THIS happening and how do I fix it?
*EDIT2: I have found that making the command curl -L
makes curl follow redirects - which then spits out the webpage I need. However, how do I translate that to colly? Because colly is still picking up the 301 error.
import (
"fmt"
"strings"
"github.com/gocolly/colly"
)
func main() {
/* only navigate to links within these paths */
tld1 := "/vinfo/us/security/research-and-analysis/threat-reports"
c := colly.NewCollector(
colly.AllowedDomains("trendmicro.com", "documents.trendmicro.com"),
)
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
link := e.Attr("href")
fmt.Printf("Link found: %q -> %s
", e.Text, link)
if strings.Contains(link, tld1) {
c.Visit(e.Request.AbsoluteURL(link))
}
})
c.OnRequest(func(r * colly.Request) {
fmt.Println("Visiting", r.URL.String())
})
c.Visit("https://trendmicro.com/vinfo/us/security/research-and-analysis/threat-reports")
}