各位大佬们,小弟我刚学完爬虫基础来实战爬取50页内容
http://fund.eastmoney.com/manager/jjjl_all_penavgrowth_desc.html?rd=0.770561125401394#dt14;mcreturnjson;ftall;pn20;pi1;scpenavgrowth;stdesc
网页只有pi后面的数字有变化,1到50,用selenium,内容能爬取是对的,但是始终爬取的是第一页的内容。
困扰我好几天了,找资料,改代码都改废了,求解决,谢谢了
from selenium import webdriver #从selenium库中调用webdriver模块
from selenium.webdriver.chrome.options import Options # 从options模块中调用Option
chrome_options = Options() # 实例化Option对象
chrome_options.add_argument('--headless') # 把Chrome浏览器设置为静默模式
driver = webdriver.Chrome(options = chrome_options)
import time
import csv
from bs4 import BeautifulSoup
import requests
csv_file = open('基金经理2.28.csv','w',newline='')
writer=csv.writer(csv_file)
writer.writerow(['经理','时间','基金规模','收益'])
url='http://fund.eastmoney.com/manager/jjjl_all_penavgrowth_desc.html?rd=0.770561125401394#dt14;mcreturnjson;ftall;pn20;pi{real_page};scpenavgrowth;stdesc'
headers={'User-Agent':' Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36 Edg/88.0.705.81'}
number_manager_list = []
for a in range(1, 50):
number_manager_list.append(a)
for page in range(1,4):
act_url=url.format(real_page=page)
driver.get(act_url)
time.sleep(10)
for x in number_manager_list:
all_manager = driver.find_elements_by_tag_name('td')[5 + 7 * x].text
all_time = driver.find_elements_by_tag_name('td')[8 + 7 * x].text
all_money = driver.find_elements_by_tag_name('td')[9 + 7 * x].text
all_gain = driver.find_elements_by_tag_name('td')[10 + 7 * x].text
writer.writerow([all_manager, all_time, all_money, all_gain])