zesi111 2020-10-21 11:53 采纳率: 100%
浏览 314
已采纳

python实现多个网页链接的表格循环获取功能

问题描述:实现多个网页的表格获取,并保存为excel表,excel表分页sheet分别以XX
命名,对应于上面的获取表格的名称。
已实现功能:单个页面表格的读取和保存;
未实现功能:循环网页,sheet改名。
请大神帮忙改一下代码,看看如何实现。
提供的两个网址如下:
1.https://www.marketbeat.com/stocks/NYSE/BILL/institutional-ownership/
2.https://www.marketbeat.com/stocks/NYSE/SPG/institutional-ownership/
其中,BIll和SPG是可以做成循环网址读取的,表格sheet命名也是与此对应的。
目前已实现代码如下:

import requests
from bs4 import BeautifulSoup
import xlwt

# 请求headers 模拟谷歌浏览器访问
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
}


def get_data():
    response = requests.get('https://www.marketbeat.com/stocks/NYSE/BILL/institutional-ownership/', headers=headers)
    bs = BeautifulSoup(response.text, 'lxml')

    # 标题处理
    title = bs.find_all('th')
    data_list_title = []  # 定义一个空列表
    for data in title:
        data_list_title.append(data.text.strip())  # 获取标签的内容去掉两边空格并添加到列表里

    # 内容处理
    content = bs.find_all('td')
    data_list_content = []  # 定义一个空列表
    for data in content:
        data_list_content.append(data.text.strip())  # 获取标签的内容去掉两边空格并添加到列表里
    # 语句featList = [example[i] for example in dataSet]作用为: 将dataSet中的数据按行依次放入example中,然后取得example中的example[i]元素,放入列表featList中
    del(data_list_content[80])
    print(*data_list_content,sep='\n')

    new_list = [data_list_content[i:i+7] for i in range(0, len(data_list_content)-17,8)]

    # 存入excel表格
    book = xlwt.Workbook()
    sheet1 = book.add_sheet('sheet1', cell_overwrite_ok=True)

    # 标题存入
    heads = data_list_title[:]  # 将data_list_title第一位到最后一位赋值给heads
    ii = 0
    for head in heads:
        sheet1.write(0, ii, head)
        ii += 1

    # 内容录入
    i = 1
    for list in new_list:
        j = 0
        for data in list:
            sheet1.write(i, j, data)
            j += 1
        i += 1
    # 文件保存
    book.save('./data.xls')


print("全部完成")

# 调用
get_data()

已实现表格数据如下
Reporting Date Hedge Fund Shares Held Market Value % of Portfolio Quarterly Change in Shares Ownership in Company

10/9/2020 Envestnet Asset Management Inc. 2,174 $0.22M 0.0% N/A 0.003%

10/6/2020 Avitas Wealth Management LLC 5,410 $0.54M 0.2% +57.2% 0.007%

9/28/2020 Manchester Capital Management LLC 1,500 $0.14M 0.0% +50.0% 0.002%

9/22/2020 Atria Investments LLC 31,463 $2.84M 0.1% N/A 0.039%

9/15/2020 Two Sigma Advisers LP 5,800 $0.52M 0.0% N/A 0.007%

9/15/2020 Schonfeld Strategic Advisors LLC 62,798 $5.67M 0.1% N/A 0.078%

9/4/2020 Principal Financial Group Inc. 2,390 $0.22M 0.0% N/A 0.003%

8/27/2020 Neuberger Berman Group LLC 34,087 $3.08M 0.0% +88.1% 0.047%

8/25/2020 Nuveen Asset Management LLC 96,953 $8.75M 0.0% +192.5% 0.134%

8/20/2020 Charles Schwab Investment Management Inc. 95,013 $8.57M 0.0% N/A 0.131%

8/18/2020 Blackstone Group Inc 172,686 $15.58M 0.1% N/A 0.238%

8/17/2020 Engineers Gate Manager LP 5,927 $0.54M 0.0% N/A 0.008%

8/17/2020 California State Teachers Retirement System 27,675 $2.50M 0.0% +66.5% 0.038%

8/17/2020 Townsquare Capital LLC 40,538 $3.37M 0.2% N/A 0.056%

8/17/2020 Great West Life Assurance Co. Can 1,031 $93K 0.0% N/A 0.001%

8/17/2020 Public Employees Retirement System of Ohio 6,024 $0.54M 0.0% +49.0% 0.008%

8/17/2020 Sei Investments Co. 369,166 $33.30M 0.1% +3,354.7% 0.509%

8/17/2020 Capital Impact Advisors LLC 28,000 $2.53M 0.8% N/A 0.039%

8/17/2020 Private Advisor Group LLC 5,450 $0.49M 0.0% N/A 0.008%

  • 写回答

1条回答 默认 最新

      报告相同问题?

      相关推荐 更多相似问题

      悬赏问题

      • ¥20 有人知道怎么将vsi格式的图片文件,转换为svs格式的文件吗
      • ¥15 历史模拟法计算var实验报告
      • ¥15 白鲸算法优化K值的VMD分解出错
      • ¥20 写一个基于52单片机用hc-05蓝牙模块控制28BYJ-48步进电机进行旋转,在手机蓝牙串口输入1019电机转半圈,输入2038电机转一圈,输入0复位的代码吗
      • ¥15 求51单片机8位数码管计时器程序
      • ¥20 matlab识别螺母边缘
      • ¥15 python 6x6游戏加登录、记录系统
      • ¥100 基于做一个模拟智慧路灯
      • ¥15 ME21N 创建采购成功并且生成采购订单号,但显示“快件文档更新已取消”,SM13看错误提示为如下截图:
      • ¥30 android 集成fmod实现变声功能中遇到的问题