问题描述:实现多个网页的表格获取,并保存为excel表,excel表分页sheet分别以XX
命名,对应于上面的获取表格的名称。
已实现功能:单个页面表格的读取和保存;
未实现功能:循环网页,sheet改名。
请大神帮忙改一下代码,看看如何实现。
提供的两个网址如下:
1.https://www.marketbeat.com/stocks/NYSE/BILL/institutional-ownership/
2.https://www.marketbeat.com/stocks/NYSE/SPG/institutional-ownership/
其中,BIll和SPG是可以做成循环网址读取的,表格sheet命名也是与此对应的。
目前已实现代码如下:
import requests
from bs4 import BeautifulSoup
import xlwt
# 请求headers 模拟谷歌浏览器访问
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
}
def get_data():
response = requests.get('https://www.marketbeat.com/stocks/NYSE/BILL/institutional-ownership/', headers=headers)
bs = BeautifulSoup(response.text, 'lxml')
# 标题处理
title = bs.find_all('th')
data_list_title = [] # 定义一个空列表
for data in title:
data_list_title.append(data.text.strip()) # 获取标签的内容去掉两边空格并添加到列表里
# 内容处理
content = bs.find_all('td')
data_list_content = [] # 定义一个空列表
for data in content:
data_list_content.append(data.text.strip()) # 获取标签的内容去掉两边空格并添加到列表里
# 语句featList = [example[i] for example in dataSet]作用为: 将dataSet中的数据按行依次放入example中,然后取得example中的example[i]元素,放入列表featList中
del(data_list_content[80])
print(*data_list_content,sep='\n')
new_list = [data_list_content[i:i+7] for i in range(0, len(data_list_content)-17,8)]
# 存入excel表格
book = xlwt.Workbook()
sheet1 = book.add_sheet('sheet1', cell_overwrite_ok=True)
# 标题存入
heads = data_list_title[:] # 将data_list_title第一位到最后一位赋值给heads
ii = 0
for head in heads:
sheet1.write(0, ii, head)
ii += 1
# 内容录入
i = 1
for list in new_list:
j = 0
for data in list:
sheet1.write(i, j, data)
j += 1
i += 1
# 文件保存
book.save('./data.xls')
print("全部完成")
# 调用
get_data()
已实现表格数据如下
Reporting Date Hedge Fund Shares Held Market Value % of Portfolio Quarterly Change in Shares Ownership in Company
10/9/2020 Envestnet Asset Management Inc. 2,174 $0.22M 0.0% N/A 0.003%
10/6/2020 Avitas Wealth Management LLC 5,410 $0.54M 0.2% +57.2% 0.007%
9/28/2020 Manchester Capital Management LLC 1,500 $0.14M 0.0% +50.0% 0.002%
9/22/2020 Atria Investments LLC 31,463 $2.84M 0.1% N/A 0.039%
9/15/2020 Two Sigma Advisers LP 5,800 $0.52M 0.0% N/A 0.007%
9/15/2020 Schonfeld Strategic Advisors LLC 62,798 $5.67M 0.1% N/A 0.078%
9/4/2020 Principal Financial Group Inc. 2,390 $0.22M 0.0% N/A 0.003%
8/27/2020 Neuberger Berman Group LLC 34,087 $3.08M 0.0% +88.1% 0.047%
8/25/2020 Nuveen Asset Management LLC 96,953 $8.75M 0.0% +192.5% 0.134%
8/20/2020 Charles Schwab Investment Management Inc. 95,013 $8.57M 0.0% N/A 0.131%
8/18/2020 Blackstone Group Inc 172,686 $15.58M 0.1% N/A 0.238%
8/17/2020 Engineers Gate Manager LP 5,927 $0.54M 0.0% N/A 0.008%
8/17/2020 California State Teachers Retirement System 27,675 $2.50M 0.0% +66.5% 0.038%
8/17/2020 Townsquare Capital LLC 40,538 $3.37M 0.2% N/A 0.056%
8/17/2020 Great West Life Assurance Co. Can 1,031 $93K 0.0% N/A 0.001%
8/17/2020 Public Employees Retirement System of Ohio 6,024 $0.54M 0.0% +49.0% 0.008%
8/17/2020 Sei Investments Co. 369,166 $33.30M 0.1% +3,354.7% 0.509%
8/17/2020 Capital Impact Advisors LLC 28,000 $2.53M 0.8% N/A 0.039%
8/17/2020 Private Advisor Group LLC 5,450 $0.49M 0.0% N/A 0.008%