Hold_C
Hold_C
采纳率73.3%
2020-08-10 15:18

scrapy + selenium 抓取不到完整的网易云页面

问题

 得到的网易云页面不完整,这是为什么啊?恳请大佬解答

爬虫代码

import scrapy
class wangyiyun_spider(scrapy.Spider):
    name = 'wy'
    def start_requests(self):
        urls=['https://music.163.com/']
        for url in urls:
            yield scrapy.Request(url=url,callback=self.parse)
    def parse(self,response):
        with open('wz.html','wb') as f:
            f.write(response.body)

MiddleWares代码

from selenium import webdriver
from scrapy.http.response.html import HtmlResponse
import time
class SeleniumParseMiddleware_req(object):
    def process_request(self,request,spider):
        url = 'https://music.163.com/'
        options= webdriver.ChromeOptions()
        options.add_argument('--log-level=3')
        brower = webdriver.Chrome(options=options)  # 实例化浏览器对象
        brower.maximize_window()  # 窗口最大化
        brower.get(url)           # 打开网页
        brower.execute_script('window.scrollTo(0,document.body.scrollHeight)')  # 下滑
        time.sleep(10)

        data = brower.page_source.encode()  # 二进制网页源码数据
        brower.close()
        brower.quit()
        response = HtmlResponse(url=url, body=data, request=request, encoding='utf-8')
        return response

class SeleniumParseMiddleware_res(object):
    def process_response(self, request, response, spider):
         return response

setting中 Middlewares已经打开

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

1条回答