douling8087 2019-07-14 08:53
浏览 91
已采纳

Ajax加载网站内容后进行Web爬取

I'm trying to get colly to scrape the following page: https://www56.muenchen.de/termin/index.php?loc=BB.

Here is my code:

package main

import (
    "fmt"
    "log"

    "github.com/gocolly/colly"
)

func main() {
    c := colly.NewCollector(
        colly.IgnoreRobotsTxt(),
        colly.Async(false),
    )

    c.OnHTML("html", func(e *colly.HTMLElement) {
        fmt.Println(e.Text)
    })

    c.OnError(func(_ *colly.Response, err error) {
        log.Println("Something went wrong:", err)
    })

    c.Visit("https://www56.muenchen.de/termin/index.php?loc=BB")

    c.OnScraped(func(r *colly.Response) {
        fmt.Println("Finished")
    })
}

The problem is that after the website is visited it loads some content. I'm unsure how to tell colly to "wait" until that has happened and then look at the result.

Looking forward to some ideas.

展开全部

  • 写回答

1条回答 默认 最新

  • douzhouhan4618 2019-07-14 10:11
    关注

    It can't since colly would have to do that client-side, but colly does not execute JavaScript - so no Ajax with it.

    To simulate a browser you can use selenium or phantomjs as the link above suggests.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
编辑
预览

报告相同问题?