FTR I have written quite a few scrapers successfully in both frameworks but I'm stumped. Here is a screenshot of the data I'm trying to scrape (you can also go to the actual link in the get request):
I attempt to target the div.section_content
:
import requests
from bs4 import BeautifulSoup
html = requests.get("https://www.baseball-reference.com/boxes/ARI/ARI201803300.shtml").text
soup = BeautifulSoup(html)
soup.findAll("div", {"class": "section_content"})
Printing the last line shows some other divs, but not the one with the pitching data.
However, I can see it's in the text, so it's not a javascript triggered loading problem (the phrase "Pitching" only comes up in that table):
>>> "Pitching" in soup.text
True
Here is an abbreviated version of one of the golang attempts:
package main
import (
"fmt"
"github.com/gocolly/colly"
)
func main() {
c := colly.NewCollector(
colly.AllowedDomains("www.baseball-reference.com"),
)
c.OnHTML("div.table_wrapper", func(e *colly.HTMLElement) {
fmt.Println(e.ChildText("div.section_content"))
})
c.Visit("https://www.baseball-reference.com/boxes/ARI/ARI201803300.shtml")
} }