利用bs4进行解析,就只能得到第一个页面的文本内容。
如下是我写的代码:
import requests from bs4 import BeautifulSoup url = "https://wenku.baidu.com/view/92996ded172ded630b1cb660.html" headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60"} page_text = requests.get(url=url,headers = headers).text soup = BeautifulSoup(page_text,"lxml") list = soup.select("#reader-container") print(list) for p in list: text = p.text print(text)