2018-07-12 07:23
浏览 224

使用BeautifulSoup或golang colly解析HTML时遇到问题

FTR I have written quite a few scrapers successfully in both frameworks but I'm stumped. Here is a screenshot of the data I'm trying to scrape (you can also go to the actual link in the get request):

enter image description here

I attempt to target the div.section_content:

import requests
from bs4 import BeautifulSoup
html = requests.get("https://www.baseball-reference.com/boxes/ARI/ARI201803300.shtml").text
soup = BeautifulSoup(html)
soup.findAll("div", {"class": "section_content"})

Printing the last line shows some other divs, but not the one with the pitching data.

However, I can see it's in the text, so it's not a javascript triggered loading problem (the phrase "Pitching" only comes up in that table):

>>> "Pitching" in soup.text

Here is an abbreviated version of one of the golang attempts:

package main

import (

func main() {
    c := colly.NewCollector(
    c.OnHTML("div.table_wrapper", func(e *colly.HTMLElement) {

} }

1条回答 默认 最新

相关推荐 更多相似问题