Python爬虫，我用bs4的find方法为什么反回的是空值？怎么解决（已解决）？

代码如下：

import time
import random
import requests
import urllib
from bs4 import BeautifulSoup
headers=("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36")
opener=urllib.request.build_opener()
opener.addheaders=[headers]
urllib.request.install_opener(opener)
class UserAgent():

    def _get_UA(self,html):
        soup = BeautifulSoup(html, "html.parser")
        ip_get = []
        ip_list = soup.find_all("tr")
        for i in range(1,len(ip_list)):
            ip_both = ip_list[i].find_all("td")
            front = ip_both[1].text+':'
            ip_get.append(front+ip_both[2].text)
        time.sleep(random.randint(15,20))
        return ip_get

    def _get_html(self,html):
        if html==None:
            this_html=urllib.request.urlopen('https://www.xicidaili.com/nn/1')
        else:
            soup = BeautifulSoup(html,"html.parser")
            next_page_url = soup.find("a",class_="next_page")
            print(next_page_url)
            html = urllib.request.urlopen('https://www.xicidaili.com'+next_page_url)
            this_html = html
        return this_html

错误出在_get_html方法中else里面的代码，传入的地址没有问题，我用浏览器可以正常打开地址https://www.xicidaili.com/nn/1
主运行代码如下：

    n = User_Agent.UserAgent()
    ip_html = n._get_html(None)

    fake_ip = n._get_UA(ip_html)
    ip_html = n._get_html(ip_html)

还有报错是这么说的：

Traceback (most recent call last):
  File "E:\java4412\spider_demo\book_spider\main.py", line 21, in <module>
None
    ip_html = n._get_html(ip_html)
  File "E:\java4412\spider_demo\book_spider\User_Agent.py", line 35, in _get_html
    html = urllib.request.urlopen('https://www.xicidaili.com'+next_page_url)
TypeError: Can't convert 'NoneType' object to str implicitly

有哪位大牛帮我看看这代码哪里不对么？本小白已经要疯了。。。。。

=========================分割线=================================================================
问题已解决
原因是我原先一直用一个固定header
我找了一个别人收集的User_Agent集合，在代码中随机更换header。
更改后的代码如下：

class UserAgent():

    def _get_UA(self,soup):
        headers=("User-Agent",Headers.getheaders())
        opener=urllib.request.build_opener()
        opener.addheaders=[headers]
        urllib.request.install_opener(opener)
#         soup = BeautifulSoup(html, "html.parser")
        ip_get = []
        ip_list = soup.find_all("tr")
        for i in range(1,len(ip_list)):
            ip_both = ip_list[i].find_all("td")
            front = ip_both[1].text+':'
            ip_get.append(front+ip_both[2].text)
        time.sleep(random.randint(15,20))
        return ip_get

    def _get_html_first(self):    
        headers=("User-Agent",Headers.getheaders())
        opener=urllib.request.build_opener()
        opener.addheaders=[headers]
        urllib.request.install_opener(opener)
        this_html=urllib.request.urlopen('https://www.xicidaili.com/nn/1')
        soup = BeautifulSoup(this_html,"html.parser")
        return soup
    def _get_soup(self,soup):
        headers=("User-Agent",Headers.getheaders())
        opener=urllib.request.build_opener()
        opener.addheaders=[headers]
        urllib.request.install_opener(opener)
        next_page_url = soup.find("a",class_="next_page").get('href')
        print(next_page_url)
        html = urllib.request.urlopen('https://www.xicidaili.com'+next_page_url)
        soup = BeautifulSoup(html,'html.parser')
        return soup

进行了一定的修改，可以正确运行。其中的_print（）_是我为验证结果打的。

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
weixin_42062762 2019-08-18 15:56
关注
find找到空值，就证明没找到，你可以先打印request返回的内容，确认确实返回了内容再然后就是你的find查找标签，可能标签不对，
先找上一级标签，print出来看行不行一级一级往下呗。

另外 https://www.xicidaili.com/nn/1我就打不开啊

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

Python爬虫，我用bs4的find方法为什么反回的是空值？怎么解决（已解决）？ python
2019-08-18 15:16

回答 1 已采纳 find找到空值，就证明没找到，你可以先打印request返回的内容，确认确实返回了内容再然后就是你的find查找标签，可能标签不对，先找上一级标签，print出来看行不行一级一级往下呗。
python爬虫xpath解析返回为空有什么解决方法吗 python 爬虫
2021-12-10 11:12

回答 1 已采纳为空的地方xpath主要是为了拿到什么数据，贴一下网页和要拿到的数据，帮你写一下xpath。上面图上的xpath那么长那么绝对路径，大概率拿不到数据的。
python用merge数据合并空值怎么办？ python
2020-10-30 23:39

回答 1 已采纳 merge使用并进行连接前，要注意对连接的关键字做字符化或者整型化，保证连接的关键字值和类型一致。
day19 学习python爬虫——requests和bs4
2022-10-18 23:59

长&风**的博客 *day19 学习python爬虫——requests(使用详解、请求详解三种情况)和bs4（请求详解）、json数据解析
python爬虫request后返回值为空 chrome python 有问必答爬虫
2022-01-27 16:25

回答 2 已采纳接口需要post请求并发送数据，题主get请求没用改下面就可以了，注意不能采集太快，有防火墙会拦截。。-_-||。。。 import requests import time headers =
请问python如何读取csv文件某列时保留空值？ python 机器学习
2019-11-07 05:02

回答 4 已采纳你取了一列带空值的dataFrame，统计元素个数的时候用这个dict(data['Gender'].value_counts()) 就是默认去掉空值，那你画图也就没有空值了，你可以用这个：dic
python爬虫关于xpath提取出来为空列表的问题 python 有问必答爬虫
2021-09-30 17:40

回答 2 已采纳你检查下这个网页中的内容是不是通过js代码读取外部json数据来动态更新的。requests只能获取网页的静态源代码，动态更新的内容取不到。对于动态更新的内容要用selenium 来爬取。或者是通
爬虫数据解析bs4获取所有属性时，bs.find_all返回值为空列表，请问什么原因
2022-01-27 16:28

孟波61的博客 python爬虫数据解析bs4获取属性返回空列表的问题
python中使用replace替换为空值，再使用fillna填充失败 python
2022-05-26 16:26

回答 3 已采纳换一个思路，有2列数据，只要B列数据单元格中包含字母'M'，则将这个单元格的数据赋值为对应A列单元格的数据
前程无忧网址数据练习抓取，为什么会返回空值(语言-python) python 有问必答爬虫
2021-12-05 21:39

回答 1 已采纳你输出下str_data 看看源代码中有你需要爬取的内容吗你检查下这个网页中的内容是不是通过js代码读取外部json数据来动态更新的。requests只能获取网页的静态源代码，动态更新的内容取不到。
关于#Python#的问题，如何解决？ python
2022-11-27 19:47

回答 5 已采纳你看看是这个意思不？ import numpy as np import pandas as pd df = pd.DataFrame([[1, 2, 3, 'null'],
Python进阶知识（1）—— 什么是爬虫？爬文档，爬图片，万物皆可爬，文末附模板
2023-05-12 12:07

Ltd Pikashu的博客 Python小白入门必看文章（5），Python进阶知识文章（1），主要描述了什么是Pyhton爬虫，爬虫的基本步骤，并在文末总结了爬虫的基本模板以供方便使用，如果对你有帮助的话，请给我一个三连哦，谢谢各位大佬的观看。
LDA模型运行时报错，如何解决？(语言-python) nlp python 有问必答自然语言处理
2022-03-08 09:36

回答 2 已采纳可能是路径中包含中文，在操作文件的过程中会因为中文字符的原因导致无法找到正常的路径，因此会出现 Users\ + xe6…那些报错。
python 爬虫输出为空,python爬取文件时，内容为空
2021-04-26 20:10

weixin_39631572的博客解决方式：img_res = requests.get(src,headers=header)在header中加上referer防盗链加上防盗链header的例子：header = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:66.0) Gecko/20100101 Firefox/66.0...
python的requests和bs4的使用
2022-10-18 23:01

张鱼小新的博客 # '"abc"' dumps(10) # '10' dumps([10, 'abc', None, False]) # '[10, "abc", null, false]' dumps({'a': 10, 10: 20, 'c': 'd'}) # '{"a": 10, "10": 20, "c": "d"}' 四、bs4网页数据解析使用bs4安装第三方库的...
Python爬虫方法三部曲
2022-03-28 12:20

业里村牛欢喜的博客爬虫三种方法大总结，喜欢网友撸起袖子来干。爬虫bs、re、正则表达式方法。
Python爬虫：爬取京东商品评论(处理json) urllib3+bs4+sqlite3
2019-02-02 04:36

瞧德的博客通过观察京东商品页面返回的评论数据是 JSON 格式的，所以抓取指定评论需要使用 JSON 模块中相应的 API 进行分析，而从搜索页面抓取的商品列表需要分析 HTML 代码，所以使用 bs4。在对数据进行分析整理后，需要将...
入门级，超简单的python使用requests+bs4库实现京东商品获取（附代码）
2021-09-02 10:30

猴哥网络的博客首先进行bs4库和requests的安装 pip install beautifulsoup4 pip install requests 2.使用requests获取京东商品界面的源码给商品面页发送一个GET请求即可获得京东的html代码 import requests shop_name = ...
Python用requests库爬取网页内容，返回为‘’（为空）的解决办法。
2019-07-09 20:09

阿冲要努力赚钱的博客首先介紹一下我們用360搜索派取城市排名前20。我们爬取的网址：https://baike.so.com/doc/24368318-25185095.html 我们要爬取的内容： html字段： robots协议：现在我们开始用python IDLE 爬取 import requests...
没有解决我的问题, 去提问

悬赏问题

¥15 matlab有关常微分方程的问题求解决
¥15 perl MISA分析p3_in脚本出错
¥15 k8s部署jupyterlab，jupyterlab保存不了文件
¥15 ubuntu虚拟机打包apk错误
¥199 rust编程架构设计的方案有偿
¥15 回答4f系统的像差计算
¥15 java如何提取出pdf里的文字？
¥100 求三轴之间相互配合画圆以及直线的算法
¥100 c语言，请帮蒟蒻写一个题的范例作参考
¥15 名为“Product”的列已属于此 DataTable