python BeautifulSoup模块解码

在IDLE中执行下面的代码出现警告
代码：

soup = BeautifulSoup(html.read().decode('utf-8','ignore'), "html")

警告是：

WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.

官方解释是：
In rare cases (usually when a UTF-8 document contains text written in a completely different encoding), the only way to get Unicode may be to replace some characters with the special Unicode character “REPLACEMENT CHARACTER” (U+FFFD, �). If Unicode, Dammit needs to do this, it will set the .contains_replacement_characters attribute to True on the UnicodeDammit or BeautifulSoup object. This lets you know that the Unicode representation is not an exact representation of the original–some data was lost. If a document contains �, but .contains_replacement_characters is False, you’ll know that the � was there originally (as it is in this paragraph) and doesn’t stand in for missing data.

我该怎么办呢？

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
mengzhendream 2016-10-23 07:29
关注
BeautifulSoup(open(html_path, 'r'),"html.parser",from_encoding="iso-8859-1")

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

Python用beautifulsoup爬取网页数据 python
2022-04-29 00:54

回答 1 已采纳是不是最后一页的数据？导出覆盖了吧
pycharm安装不了BeautifulSoup模块。 python
2020-04-16 13:40

回答 2 已采纳 ``` from bs4 import BeautifulSoup ``` 你改成这样看行吗？
python beautifulsoup 解析html无法获得全部html代码 python
2021-01-04 15:04

回答 3 已采纳因为这个div里面的内容是用ajax动态加载的，而用request获取的是网页的源代码（就是“右键菜单->查看网页源代码”的内容），不包含ajax动态加载的内容。所以要找到ajax加载数据的
python 模块BeautifulSoup 从HTML或XML文件中提取数据
2023-08-22 22:15

局外人LZ的博客 BeautifulSoup 用来解析 HTML 比较简单，API非常人性化，支持CSS选择器、Python标准库中的HTML解析器，也支持 lxml 的 XML解析器。Beautiful Soup 是一个HTML/XML的解析器，主要的功能也是如何解析和提取 HTML/XML ...
Python BeautifulSoup获取属性值怎么? python
2019-09-20 15:38

回答 1 已采纳 ``` from bs4 import BeautifulSoup html='' soup=BeautifulSoup(html,'lxml') imgs=soup.sele
Python BeautifulSoup find_all 问题 python
2017-04-09 04:28

回答 1 已采纳 Unicode的内容，你可以用decode方法转换成你想要的编码方式。
python 使用BeautifulSoup 出错 python
2017-08-16 08:57

回答 3 已采纳 nostarchsoup=bs4.BeautifulSoup(res.text，'html.parser')这样写
Python爬虫之数据解析——BeautifulSoup亮汤模块（一）：基础与遍历（接上文，2023美赛春季赛帆船数据解析sailboatdata.com）
2023-08-24 07:36

和谐号hexh的博客 1.帆船名称：11 METER 2.Sailboat Specifications 事实上，还... Python爬虫之数据解析——BeautifulSoup亮汤模块（二）：搜索（再接上文，2023美赛春季赛帆船数据解析sailboatdata.com）_和谐号hexh的博客-CSDN博客
Python爬虫 BeautifulSoup解析网页爬取内容为None python 有问必答
2021-08-31 14:07

回答 2 已采纳你抓的频率太快，IP被墙了
Python的BeautifulSoup的select解释 python
2021-06-30 14:11

回答 2 已采纳这里的意思应该是选择某一个无序列表“li”下的所有a元素节点 data = soup.select('html>body>div.class.wrap clearfix>div.id
python BeautifulSoup 使用里面的html.string时候有什么要求吗？ python
2019-01-11 13:03

回答 1 已采纳 html.tbody.text
Python常用模块
2022-06-18 19:20

生活需要深度的博客 Python常用库大全，看看有没有你需要的。环境管理包管理包仓库分发打包为可执行文件以便分发。构建工具将源码编译成软件。交互式解析器交互式 Python 解析器。文件文件管理和 MIME（多用途的网际...
python使用BeautifulSoup遇到的问题 python
2018-03-21 09:23

回答 4 已采纳 bsObj=BeautifulSoup(html, "html.parser",from_encoding='utf-8') 试试html.parser
Python爬虫之美丽的汤——BeautifulSoup
2023-05-14 09:00

朦胧的雨梦的博客本篇文章主要介绍利用Python爬虫之美丽的汤——BeautifulSoup，适合练习爬虫基础同学，文中描述和代码示例很详细，干货满满，感兴趣的小伙伴快来一起学习吧！
python的urllib四大模块_详解python内置模块urllib
2021-03-06 18:38

Everlasting Cold的博客 urllib 是 python 的内置模块，主要用于处理url相关的一些操作，例如访问url、解析url等操作。urllib 包下面的request模块主要用于访问url，但是用得太多，因为它的光芒全都被requests这个第三方库覆盖了，最常用的...
没有解决我的问题, 去提问

悬赏问题

¥100 求数学坐标画圆以及直线的算法
¥100 c语言，请帮蒟蒻写一个题的范例作参考
¥15 名为“Product”的列已属于此 DataTable
¥15 安卓adb backup备份应用数据失败
¥15 eclipse运行项目时遇到的问题
¥15 关于#c##的问题：最近需要用CAT工具Trados进行一些开发
¥15 南大pa1 小游戏没有界面，并且报了如下错误，尝试过换显卡驱动，但是好像不行
¥15 自己瞎改改，结果现在又运行不了了
¥15 链式存储应该如何解决
¥15 没有证书，nginx怎么反向代理到只能接受https的公网网站

python BeautifulSoup模块解码

1条回答 默认 最新

悬赏问题

1条回答默认最新