![img]
![img](https://img-mid.csdnimg.cn/release/static/image/mid/ask/8
这四个图是我的程序,COL2和COL3已经写好,我试爬了一下,excel上任何信息都没有导出?哪里错了呢?求指导
![img]
![img](https://img-mid.csdnimg.cn/release/static/image/mid/ask/8
这四个图是我的程序,COL2和COL3已经写好,我试爬了一下,excel上任何信息都没有导出?哪里错了呢?求指导
结论:
1、你的url是错误的。至少格式上来就错了。(https://liansai.500.com/zuqiu-6296/jifen-17831/2023-04-10&page=1);
2、col2之后取值也错误;干脆别用了;
方法:如果知道第几轮了,就不用取总数了;
1、https://liansai.500.com/zuqiu-6296/jifen-17831,html解析获取总页数;(这个你是没毛病的)
2、每页数据:用https://liansai.500.com/index.php?c=score&a=getmatch&stid=17831&round=38,得到json格式数据;(回头有验证再说,至少现在没有)
测试:
import openpyxl
import requests
from lxml import etree
def get_page():
r = requests.get('https://liansai.500.com/zuqiu-6296/jifen-17831', timeout=200).text
page_max = etree.HTML(r).xpath('//*[@id="match_group"]/li[last()]/a/text()')
page_total = page_max[0]
return page_total
def get_data(page):
url = 'https://liansai.500.com/index.php?c=score&a=getmatch&stid=17831&round={0}'.format(page)
r = requests.get(url, timeout=200)
return r.json()
def to_excel(file_name, data_list):
wb = openpyxl.Workbook()
for x in data_list:
wb.active.append(x)
wb.save(file_name)
if __name__ == '__main__':
p_num = get_page()
print('总轮{0}'.format(p_num))
page = 38
d = get_data(page)
data = [['轮次', '比赛时间', '主队', '主队分', '客队', '客队分']]
for x in d:
data.append([x['round'], x['stime'], x['hname'], x['gscore'], x['gname'], x['gscore']])
to_excel('d:\\Desktop\\d38.xlsx', data)
print('第{0}轮,{1}'.format(38, len(data)))