SDVAERVA
2021-03-13 21:56
采纳率: 80%
浏览 57
已结题

爬虫网页不能进入循环

各位大佬们,小弟我刚学完爬虫基础来实战爬取50页内容

http://fund.eastmoney.com/manager/jjjl_all_penavgrowth_desc.html?rd=0.770561125401394#dt14;mcreturnjson;ftall;pn20;pi1;scpenavgrowth;stdesc

网页只有pi后面的数字有变化,1到50,用selenium,内容能爬取是对的,但是始终爬取的是第一页的内容。

困扰我好几天了,找资料,改代码都改废了,求解决,谢谢了

from selenium import  webdriver #从selenium库中调用webdriver模块
from selenium.webdriver.chrome.options import Options # 从options模块中调用Option
chrome_options = Options() # 实例化Option对象
chrome_options.add_argument('--headless') # 把Chrome浏览器设置为静默模式
driver = webdriver.Chrome(options = chrome_options)
import  time
import csv
from bs4 import BeautifulSoup
import requests
csv_file = open('基金经理2.28.csv','w',newline='')
writer=csv.writer(csv_file)
writer.writerow(['经理','时间','基金规模','收益'])


url='http://fund.eastmoney.com/manager/jjjl_all_penavgrowth_desc.html?rd=0.770561125401394#dt14;mcreturnjson;ftall;pn20;pi{real_page};scpenavgrowth;stdesc'
headers={'User-Agent':' Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36 Edg/88.0.705.81'}

number_manager_list = []
for a in range(1, 50):
    number_manager_list.append(a)

for page in range(1,4):
    act_url=url.format(real_page=page)
    driver.get(act_url)
    time.sleep(10)
    for x in number_manager_list:
        all_manager = driver.find_elements_by_tag_name('td')[5 + 7 * x].text
        all_time = driver.find_elements_by_tag_name('td')[8 + 7 * x].text
        all_money = driver.find_elements_by_tag_name('td')[9 + 7 * x].text
        all_gain = driver.find_elements_by_tag_name('td')[10 + 7 * x].text
        writer.writerow([all_manager, all_time, all_money, all_gain])

  • 写回答
  • 好问题 提建议
  • 追加酬金
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • cclxpp123 2021-03-13 22:53
    已采纳

    找到下一页按钮点一下; 或者用post上传参数. 

    评论
    解决 无用
    打赏 举报

相关推荐 更多相似问题