m0_73837562 2023-06-03 17:36 采纳率: 0%
浏览 62

我主要就是这样把文章内容运行了一下,但是这个如何更改爬取多页我不太懂,能不能教我一下

https://blog.csdn.net/m0_62428181/article/details/129597479?spm=1001.2014.3001.5502
主要是根据这个博主,但是我根据这个博主只能爬取一页的信息,怎么样爬取多页的信息?


import csv
import random
import time
from time import sleep
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver import ChromeOptions
from selenium.webdriver.common.by import By
#2.导入库
option = ChromeOptions()
option.add_experimental_option('excludeSwitches', ['enable-automation'])
option.add_experimental_option('detach', True)
#去除浏览器识别
driver = webdriver.Chrome(options=option)
driver.get("https://www.51job.com/")
time.sleep(2) #防止加载缓慢,休眠2秒
script = 'Object.defineProperty(navigator, "webdriver", {get: () => false,});'
driver.execute_script(script)
driver.find_element(By.XPATH, '//*[@id="kwdselectid"]').click()
driver.find_element(By.XPATH, '//*[@id="kwdselectid"]').clear()
driver.find_element(By.XPATH, '//*[@id="kwdselectid"]').send_keys('会计')#定位输入框并查找相关职位
driver.find_element(By.XPATH, '/html/body/div[3]/div/div[1]/div/button').click()
# driver.implicitly_wait(10)
time.sleep(5)
print(driver.current_url)
jobData = driver.find_elements(By.XPATH, '//*[@id="app"]/div/div[2]/div/div/div[2]/div/div[2]/div/div[2]/div[1]/div')
for job in jobData:
        jobName = job.find_element(By.CLASS_NAME, 'jname.at').text
#         time.sleep(random.randint(5, 15) * 0.1)
        jobSalary = job.find_element(By.CLASS_NAME, 'sal').text
#         time.sleep(random.randint(5, 15) * 0.1)
        jobCompany = job.find_element(By.CLASS_NAME, 'cname.at').text
#         time.sleep(random.randint(5, 15) * 0.1)
        company_type_size = job.find_element(By.CLASS_NAME, 'dc.at').text
#         time.sleep(random.randint(5, 15) * 0.1)
        company_status = job.find_element(By.CLASS_NAME, 'int.at').text
#         time.sleep(random.randint(5, 15) * 0.1)
        address_experience_education = job.find_element(By.CLASS_NAME, 'd.at').text
#         time.sleep(random.randint(5, 15) * 0.1)
 
        try:
            job_welf = job.find_element(By.CLASS_NAME, 'tags').get_attribute('title')
        except:
            job_welf = '无数据'
        time.sleep(random.randint(5, 15) * 0.1)
 
        update_date = job.find_element(By.CLASS_NAME, 'time').text
        time.sleep(random.randint(5, 15) * 0.1)
for i in range(1,10):
    driver.find_element(By.XPATH, '//*[@id="jump_page"]').click()
    time.sleep(random.randint(10, 30) * 0.1)
    driver.find_element(By.XPATH, '//*[@id="jump_page"]').clear()
    time.sleep(random.randint(10, 40) * 0.1)
    driver.find_element(By.XPATH, '//*[@id="jump_page"]').send_keys(i)
    time.sleep(random.randint(10, 30) * 0.1)
    driver.find_element(By.XPATH,'//*[@id="app"]/div/div[2]/div/div/div[2]/div/div[2]/div/div[3]/div/div/span[3]').click()
with open('wuyou_teacher.csv', 'a', newline='') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow([jobName, jobSalary, jobCompany, company_type_size, company_status, address_experience_education,job_welf,update_date])
 
        print(jobName, jobSalary, jobCompany, company_type_size, company_status, address_experience_education, job_welf,update_date)

  • 写回答

1条回答 默认 最新

  • Richard.sysout 2023-06-03 18:08
    关注

    哪里有什么双重for循环,这是两个分开的for循环,不存在嵌套关系
    两个循环都是在做查找元素相关的操作,哪里不明白

    评论 编辑记录

报告相同问题?

问题事件

  • 修改了问题 6月3日
  • 修改了问题 6月3日
  • 创建了问题 6月3日

悬赏问题

  • ¥500 把面具戴到人脸上,请大家贡献智慧
  • ¥15 任意一个散点图自己下载其js脚本文件并做成独立的案例页面,不要作在线的,要离线状态。
  • ¥15 各位 帮我看看如何写代码,打出来的图形要和如下图呈现的一样,急
  • ¥30 c#打开word开启修订并实时显示批注
  • ¥15 如何解决ldsc的这条报错/index error
  • ¥15 VS2022+WDK驱动开发环境
  • ¥30 关于#java#的问题,请各位专家解答!
  • ¥30 vue+element根据数据循环生成多个table,如何实现最后一列 平均分合并
  • ¥20 pcf8563时钟芯片不启振
  • ¥20 pip2.40更新pip2.43时报错