如下图:
在节点内
我想爬取href的数据,既/tjgb/20gx/36169.html
但是我代码写content_all = soup.find_all.table(class_="box") 时却什么也爬不下来,结果是个空列表。
请问应该怎么准确定位到包含href内容的那个节点呢?
网站的网址是 http://tjcn.org/tjgb/20gx/index.html
以下是我写的代码
import re
import requests
from bs4 import BeautifulSoup
for page in range(0,10):
url = f"http://www.tjcn.org/tjgb/20gx/index_{page}.html"
if page == 0:
url = "http://www.tjcn.org/tjgb/20gx/index.html"
print(url)
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36"}
response = requests.get(url, headers=headers)
response.encoding = response.apparent_encoding
html = response.text
soup = BeautifulSoup(html, "lxml")
content_all = soup.find_all.table(class_="box")
print(content_all)