这里一共有三个函数,第一个函数是用来将网页源代码提取出来;第二个函数是将所有的tbody标签的子孙标签td全部放在list列表中,这个函数最后是把list变成一个二维数组,第一维包含了某一所大学的所有信息,第二维是所有大学;第三个函数是将list这个二维数组中的每个元素都提取出来,变成soup,然后用soup.td.string来提取NavigableString。打印list中的元素类型也是bs4.element.Tag'。但是有报错,求救各位大神
但是报错:
Traceback (most recent call last):
File "C:/Users/98047/Desktop/Source code HAN CHEN/Vehicle Recognition/main/xas.py", line 43, in
Find_string()
File "C:/Users/98047/Desktop/Source code HAN CHEN/Vehicle Recognition/main/xas.py", line 38, in Find_string
soup = BeautifulSoup(final, "html.parser")
File "C:\Program Files\Python38\lib\site-packages\bs4__init__.py", line 286, in init
markup = markup.read()
TypeError: 'NoneType' object is not callable
import re
from bs4 import BeautifulSoup
import requests
import bs4
url= "http://www.zuihaodaxue.cn/zuihaodaxuepaiming2016.html"
list=[]
def Find_web(url):
try:
r = requests.get(url)
r.raise_for_status()
r.encoding=r.apparent_encoding
except:
print("error")
#print(r.text)
return r.text
def Process_text(text):
soup= BeautifulSoup(text,"html.parser")
for i in soup.find("tbody").children:
if isinstance(i, bs4.element.Tag):
tds=i.find_all('td')
list.append(tds)
def Find_string():
for num in range(10):
u = list[num]
# print(u)
for n in range(3):
final = u[n]
if isinstance(final, bs4.element.Tag):
print(type(final))
print(final)
soup = BeautifulSoup(final, "html.parser")
print(soup.td.string)
text= Find_web(url)
Process_text(text)
Find_string()