被蛇咬的工程师 2021-09-14 08:08 采纳率: 100%
浏览 67
已结题

使用python requests爬取百度图片的时候报错,不知道是哪里的问题

img

下面是代码


import requests
import time
import json


class Image(object):
    def __init__(self):
        self.url = "https://image.baidu.com/search/acjson?"
        self.header = {
            "Cookie": "BDqhfp=%E7%8B%97%26%260-10-1undefined%26%2612968%26%267; BIDUPSID=A9057507F795F3B64AAD1B3C8EEE77F4; PSTM=1628755996; BAIDUID=A9057507F795F3B68C23CF2841217265:FG=1; __yjs_duid=1_6b5e03d0f7ee9d96564f91318a17c3f61629164474262; BDSFRCVID_BFESS=IykOJexroG0YRbTHbwHVKreH-9NbUdrTDYrEQ-mAp1wm6V8VJeC6EG0Pts1-dEu-EHtdogKKBmOTHnuF_2uxOjjg8UtVJeC6EG0Ptf8g0M5; H_BDCLCKID_SF_BFESS=tR3j3Ru8KJjEe-Kk-PnVeUF-MPnZKRvHa2kjhxotQpnDfpnKyn5qyKr-Xx5PKnjn3N5HKxDEalj2ht3zjlDW3xI8LNj405OTbTADsRbNb66pO-bghPJvyUDDXnO72JQlXbrtXp7_2J0WStbKy4oTjxL1Db3JKjvMtgDtVJO-KKChbKPwjf5; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; H_PS_PSSID=34440_34145_31254_34551_33848_34525_34585_34092_34106_26350_34428_34556; delPer=0; PSINO=2; BAIDUID_BFESS=D9039E16E07D4F59A09083EAD9E25BBB:FG=1; BA_HECTOR=a1200h058k21a48kd41gjvkgd0q; BDRCVFR[X_XKQks0S63]=mk3SLVN4HKm; userFrom=www.baidu.com; firstShowTip=1; indexPageSugList=%5B%22%E7%8B%97%22%5D; cleanHistoryStatus=0; BDRCVFR[dG2JNJb_ajR]=mk3SLVN4HKm; ab_sr=1.0.1_NzRlNmE0M2VjMzU2NWQ2YjU5Yjk4NjQ2ZWRhZjEwZTE5MjJiYTBkNzJlODk3MDM0NmUwMjg5M2YzZmUwZTY0YWMwYWI1M2Y3OGVlMjFhZDBkZjU2YWQwOTMyOWVmOTlhYjgxMGJhNjViYzk1NDk1MWNlN2EyN2NkOTkwOWJmMjRmMmIwZGZhNzdlMGNlMTJiOWQ4MTZkZjkxYWU1ZDZhYg==; BDRCVFR[-pGxjrCMryR]=mk3SLVN4HKm",
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36 Edg/93.0.961.47"
        }

        self.params = {
            " n": "resultjson_com",
            " logid": "8234466765640172309",
            "ipn": "rj",
            "ct": "201326592",
            "is": "",
            "fp": "result",
            "queryWord": "狗",
            "cl": "2",
            "lm": "-1",
            "ie": "utf-8",
            "oe": "utf-8",
            "adpicid": "",
            "st": "-1",
            "z": "",
            "ic": "0",
            "hd": "",
            "latest": "",
            "copyright": "",
            "word": "狗",
            "s": "",
            "se": "",
            "tab": "",
            "width": "",
            "height": "",
            "face": "0",
            "istype": "2",
            "qc": "",
            "nc": "1",
            "fr": "",
            "expermode": "",
            "nojc": "",
            "pn": "180",
            "rn": "30",
            "gsm": "3c",
            "time": ""
        }
        self.image_list = []

    def get_image(self, num):
        for i in range(0, num):
            self.params["time"] = int(time.time() * 1000)
            self.params["pn"] = i * 30
            response = requests.get(url=self.url, headers=self.header, params=self.params)

            for j in range(0, len(response.json()["data"])-1):
                self.image_list.append(response.json()["data"][j]["thumbURL"])


    def save_image(self):
        n = 1
        for i in self.image_list:
            image = requests.get(url=i)
            with open("./图片/{}.jpg".format(n), "wb") as f:
                f.write(image.content)
                n += 1


if __name__ == '__main__':
    image = Image()
    image.get_image(3)
    image.save_image()

  • 写回答

2条回答 默认 最新

  • CSDN专家-showbo 2021-09-14 08:50
    关注

    主要是因为tn参数(你的代码写成n了,少了t)没传,接口返回404的html了,而不是返回查询结果内容。更正参数名称为tn就可以了。
    做采集的时候参数名一定不能写错,还有一些请求头也需要加上,如user-agent,Referer,cookies之类的,因为接口可能会验证这些内容
    有帮助麻烦点个采纳【本回答右上角】,谢谢~~

    img

            self.params = {
                "tn": "resultjson_com",###########这里参数名称搞错了,下面的logid也多了个空格
                "logid": "8234466765640172309",
                "ipn": "rj",
                "ct": "201326592",
                "is": "",
                "fp": "result",
                "queryWord": "狗",
                "cl": "2",
                "lm": "-1",
                "ie": "utf-8",
                "oe": "utf-8",
                "adpicid": "",
                "st": "-1",
                "z": "",
                "ic": "0",
                "hd": "",
                "latest": "",
                "copyright": "",
                "word": "狗",
                "s": "",
                "se": "",
                "tab": "",
                "width": "",
                "height": "",
                "face": "0",
                "istype": "2",
                "qc": "",
                "nc": "1",
                "fr": "",
                "expermode": "",
                "nojc": "",
                "pn": "180",
                "rn": "30",
                "gsm": "3c",
                "time": ""
            }
    
    
    
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

问题事件

  • 系统已结题 9月22日
  • 已采纳回答 9月14日
  • 创建了问题 9月14日

悬赏问题

  • ¥15 基于ucc28019的pfc电路中芯片一直不工作
  • ¥15 yolov8在3588板子端c++推理报错
  • ¥50 unitywebrequest分段下载导致报错,如何解决?
  • ¥15 错误使用 gretna_GUI_PreprocessInterface>RunBtn_Callback
  • ¥15 WPF如何用Chart绘画出Y轴的左边数据
  • ¥15 pycharm无法查看内置代码
  • ¥15 跑hls xfopencv的例程standalone_hls_axi_example出的错误,csim没问题,c synthesis出的错误
  • ¥15 sqlserver update语句逐行生效
  • ¥20 Windows10系统命令行调用
  • ¥15 php环境如何实现国密SM2相关功能