chenyus 2024-08-31 09:22 采纳率: 100%
浏览 5
已结题

关于#python#的问题:httpx.gett爬取网站"

httpx.gett爬取网站"https://www.mouser.cn/c/semiconductors/memory-ics/dram/?pg="还是提示403,提取的内容提示“You don't have permission to access”,有什么解决办法吗?
代码如下:


import httpx
import random

headers_list = [
    "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/537.75.14",
    "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)",
    'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11',
    'Opera/9.25 (Windows NT 5.1; U; en)',
    'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)',
    'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Kubuntu)',
    'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070731 Ubuntu/dapper-security Firefox/1.5.0.12',
    'Lynx/2.8.5rel.1 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/1.2.9',
    "Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.7 (KHTML, like Gecko) Ubuntu/11.04 Chromium/16.0.912.77 Chrome/16.0.912.77 Safari/535.7",
    "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0 "
]

baseurl = "https://www.mouser.cn/c/semiconductors/memory-ics/dram/?pg="  #要爬取的网页链接
url = baseurl
n_header = {'User-Agent':random.choice(headers_list)}    #随机header
request = httpx.get(url, headers=n_header,timeout=10)
  • 写回答

1条回答 默认 最新

  • 猿途纪 优质创作者: 编程框架技术领域 2024-09-05 16:44
    关注

    你这个是被检测到了爬虫行为,推荐你使用一些专业的爬虫框架,模拟浏览器行为去爬取
    安装 Selenium 和 WebDriver,用谷歌方式的就下载chromdriver

       from selenium import webdriver
       from selenium.webdriver.chrome.service import Service
       from selenium.webdriver.common.by import By
       from selenium.webdriver.common.keys import Keys
       from selenium.webdriver.chrome.options import Options
       import time
    
       # 设置 Chrome 选项
       chrome_options = Options()
       chrome_options.add_argument("--headless")  # 无头模式
       chrome_options.add_argument("--disable-gpu")
    
       # 启动 WebDriver
       service = Service(executable_path='path/to/chromedriver')  # 替换为实际路径
       driver = webdriver.Chrome(service=service, options=chrome_options)
    
       try:
           url = "https://www.mouser.cn/c/semiconductors/memory-ics/dram/"
           driver.get(url)
    
           # 等待页面加载完成
           time.sleep(3)
    
           # 获取页面内容
           page_source = driver.page_source
           print(page_source)
    
           # 进一步解析页面内容
           # 例如提取特定元素
           products = driver.find_elements(By.CSS_SELECTOR, 'div.product-item')
           for product in products:
               product_name = product.find_element(By.CSS_SELECTOR, 'h3.product-name').text
               product_price = product.find_element(By.CSS_SELECTOR, 'span.product-price').text
               print(f'Product Name: {product_name}, Price: {product_price}')
    
       finally:
           driver.quit()
       
    
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

问题事件

  • 系统已结题 9月30日
  • 已采纳回答 9月22日
  • 创建了问题 8月31日