doufuhao8085 2018-10-24 12:54
浏览 676

如何通过colly中的id或class查找一个html元素或一组html元素?

I am using colly for scraping website. in OnHTML callback :

package main

import (
    "fmt"
    "github.com/gocolly/colly"
)

func main() {

    // Instantiate default collector
    c := colly.NewCollector()

    // On every a element which has href attribute call callback
    c.OnHTML("h3", func(e *colly.HTMLElement) {
        link := e.Text
        // Print link
        fmt.Printf("Link found: %q -> %s
", e.Text, link)
        // Visit link found on page
        // Only those links are visited which are in AllowedDomains
        c.Visit(e.Request.AbsoluteURL(link))
    })

    // Before making a request print "Visiting ..."
    c.OnRequest(func(r *colly.Request) {
        fmt.Println("Visiting", r.URL.String())
    })

    // Start scraping on https://hackerspaces.org
    c.Visit("https://bbs.archusers.ir/")
}

for example I want to get all with 'id Name' id or get all with 'class Name'. How can I do this ?!

  • 写回答

1条回答 默认 最新

  • dt614037527 2018-10-27 08:49
    关注

    I found my answer here. really great tutorial for the colly framework.

    OnHTML is a powerful tool. It can search for CSS selectors (i.e. div.my_fancy_class or #someElementId), and you can attach multiple OnHTML callbacks to your collector to handle different page types.

    评论

报告相同问题?

悬赏问题

  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?
  • ¥15 c++头文件不能识别CDialog