别折磨了 2023-04-22 14:08 采纳率: 57.1%
浏览 102
已结题

怎么获取所有链接下的单个链接里面的数据

#获取了页面所有链接,但是不能去获取一个一个链接里面的数据

import requests
from time import sleep
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Firefox()
driver.get('https://www.amazon.de/')
word = input('请输入你需要的关键词:')
driver.find_element(by=By.NAME, value="field-keywords").send_keys(word)
sleep(2)
driver.find_element(By.XPATH, "//input[@type='submit']").click()
driver.find_element(By.ID, "nav-search-submit-button").click()
url = 'https://www.amazon.de/s?k={}'.format(word)
headers = {
    'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/112.0',
    'Referer': 'https://www.amazon.de/'
}
res = requests.get(url=url, headers=headers)
html_data = res.text

for links in driver.find_elements(By.XPATH,
                                  '//*[@class="a-link-normal s-underline-text s-underline-link-text s-link-style '
                                  'a-text-normal"]'):
    sleep(1)
    print(links.get_attribute('href'))
a = []
for links in driver.find_elements(By.XPATH,
                                      '//*[@class="a-link-normal s-underline-text s-underline-link-text s-link-style '
                                      'a-text-normal"]'):
    sleep(1)
    print(links.get_attribute('href'))
    a.append(links.get_attribute('href'))
    driver.find_element(By.XPATH, '//*[@class="a-link-normal s-underline-text s-underline-link-text s-link-style '
                                  'a-text-normal"]').click()
    driver.find_element(By.ID, "sellerProfileTriggerId").click()
    box = driver.find_element(By.XPATH, "/html/body/div[1]/div[2]/div/div/div/div/div[9]/div/div/div").text
    print(box)
    driver.back()
    driver.back()


for i in adriver.find_element(By.XPATH, '//*[@class="a-link-normal s-underline-text s-underline-link-text s-link-style '
                              'a-text-normal"]').click():

    print(i)
    element = WebDriverWait(driver, 10).until(EC.element_to_be_clickable(
        (By.XPATH, '//*[@class="a-link-normal s-underline-text s-underline-link-text s-link-style a-text-normal"]')))
    element.click()


#问题报错为

Traceback (most recent call last):
  File "C:/Users/Administrator/PycharmProjects/pythonProject/amzone/进阶.py", line 36, in <module>
    print(links.get_attribute('href'))
  File "F:\venv\lib\site-packages\selenium\webdriver\remote\webelement.py", line 179, in get_attribute
    f"/* getAttribute */return ({getAttribute_js}).apply(null, arguments);", self, name
  File "F:\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 500, in execute_script
    return self.execute(command, {"script": script, "args": converted_args})["value"]
  File "F:\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 440, in execute
    self.error_handler.check_response(response)
  File "F:\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 245, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: The element with the reference d629e098-386b-4e55-abac-2271d0ca6c39 is stale; either its node document is not the active document, or it is no longer connected to the DOM
Stacktrace:
RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8
WebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:182:5
StaleElementReferenceError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:484:5
element.getKnownElement@chrome://remote/content/marionette/element.sys.mjs:488:11
deserializeJSON@chrome://remote/content/marionette/json.sys.mjs:233:33
cloneObject/result<@chrome://remote/content/marionette/json.sys.mjs:50:52
cloneObject@chrome://remote/content/marionette/json.sys.mjs:50:25
deserializeJSON@chrome://remote/content/marionette/json.sys.mjs:244:16
cloneObject@chrome://remote/content/marionette/json.sys.mjs:56:24
deserializeJSON@chrome://remote/content/marionette/json.sys.mjs:244:16
json.deserialize@chrome://remote/content/marionette/json.sys.mjs:248:10
receiveMessage@chrome://remote/content/marionette/actors/MarionetteCommandsChild.sys.mjs:85:30

#初步解决driver.refresh和time.sleep但是还是出现原有的报错

  • 写回答

4条回答 默认 最新

  • Zyb0627 2023-04-22 17:38
    关注

    引用chatGPT作答,这个报错是元素过期(stale element)引起的。这通常是因为在获取元素后,页面发生了变化,导致原有的元素无法再被使用。

    你需要重新获取需要点击的元素,而不是直接使用之前获取的元素。

    另外,你在点击链接后获取数据时,也需要等待页面加载完成再获取元素,否则可能会出现找不到元素的错误。你可以使用WebDriverWait等待特定的元素出现。

    以下是修改后的代码示例,你可以参考一下:

    import requests
    from time import sleep
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
     
    driver = webdriver.Firefox()
    driver.get('https://www.amazon.de/')
    word = input('请输入你需要的关键词:')
    driver.find_element(by=By.NAME, value="field-keywords").send_keys(word)
    sleep(2)
    driver.find_element(By.CSS_SELECTOR, "input.nav-input[type='submit']").click()
     
    # 等待搜索结果加载完成
    WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.s-search-results")))
     
    # 获取搜索结果的链接
    links = driver.find_elements(By.CSS_SELECTOR, 'a.a-link-normal.s-no-outline')
    url_list = [link.get_attribute('href') for link in links]
     
    # 点击链接获取数据
    for url in url_list:
        driver.get(url)
        # 等待页面加载完成
        WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "productTitle")))
        # 获取数据
        seller_link = driver.find_element(By.ID, "sellerProfileTriggerId").get_attribute('href')
        print(seller_link)
     
    driver.quit()
    

    在这个代码中,我使用了By.CSS_SELECTOR来获取元素,并且使用了WebDriverWait等待特定的元素出现。在循环中,我获取每个链接,然后等待页面加载完成后再获取数据。

    评论

报告相同问题?

问题事件

  • 已结题 (查看结题原因) 4月27日
  • 创建了问题 4月22日

悬赏问题

  • ¥15 is not in the mmseg::model registry。报错,模型注册表找不到自定义模块。
  • ¥15 安装quartus II18.1时弹出此error,怎么解决?
  • ¥15 keil官网下载psn序列号在哪
  • ¥15 想用adb命令做一个通话软件,播放录音
  • ¥30 Pytorch深度学习服务器跑不通问题解决?
  • ¥15 部分客户订单定位有误的问题
  • ¥15 如何在maya程序中利用python编写领子和褶裥的模型的方法
  • ¥15 Bug traq 数据包 大概什么价
  • ¥15 在anaconda上pytorch和paddle paddle下载报错
  • ¥25 自动填写QQ腾讯文档收集表