PHP - 如何检测编码？

I'm using Amazon's API to obtain the description of books. The API returns XML responses and the description is marked up (with HTML) very poorly. To deal with this poorly marked up description, which oftentimes breaks the layout of my site, I'm trying to use HTML Tidy to "clean it up."

In order to prevent "weird" characters from being displayed on my web page, I think I need to tell Tidy what the input encoding is and what the desired output encoding is. I know I want the output to be UTF8. However, I'm not sure how to determine the encoding of the input (Amazon's book description).

I've tried something like this:

mb_detect_encoding($amazon_description);

It's helped, but I'm still occasionally getting weird characters (a black diamond with a question mark in it: �). My guess is that I'm not detecting the encoding properly.

Any suggestions what I need to do?

EDIT:

This is my current solution:

$sanitized_amazon_markup = preg_replace('/[^\w`~!@#$%^&*()-=_+[\]{}|;\':",.\/<>? ]/', '', $sanitized_amazon_markup);

I'm not sure about this as this may delete stuff that I should be keeping.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doukekui0914 2011-03-10 08:49
关注
Can you provide your tidy repairString call?

If you tried to use input-encoding and output-encoding from tidy options, try to not use these and use the third argument or repairString instead, something like this :

$oTidy = new tidy(); $page_content = $oTidy->repairString($page_content, array("show-errors" => 0, "show-warnings" => false), "utf8" );

Edit :

After doing some tests, what I said before cannot work if you don't have utf8 encoding in $page_content already before calling repairString

But you will mostly end up with ISO-8859-1 (latin1) encoding if not UTF-8 already.

May I suggest you try :

$charset = mb_detect_encoding($amazon_description, 'UTF-8, ISO-8859-1'); if ($charset == "ISO-8859-1") { $amazon_description = utf8_encode($amazon_description); } $oTidy = new tidy(); $amazon_description = $oTidy->repairString($amazon_description, array("show-errors" => 0, "show-warnings" => false), "utf8" );
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

PHP - 如何检测编码？ php
2011-03-10 03:31

回答 1 已采纳 Can you provide your tidy repairString call? If you tried to use input-encoding and output-encodi
日文和俄文字符 - 网络编码？ php python
2012-03-06 10:04

回答 2 已采纳 I just edited the file site.py of python. I follow that guide: click here and everything is ok now
Unix SSH服务器上的Cron Job：UTF-8编码？ php ssh unix
2014-11-02 16:26

回答 1 已采纳 Well, along the way I certainly got to explore some encoding insights but in the end, the solution
PHP如何判断一个汉字是否是utf-8编码？
2018-04-01 09:38

会飞的小蜗的博客在PHP中，有函数mb_detect_encoding
PHP UTF-8编码回应问号 json php
2016-09-03 23:20

回答 2 已采纳 You could try to execute SET NAMES utf8 As a query. After you've made the connection. Then jus
IBM-850编码如何与PHP函数参数一起使用？ php
2017-02-27 15:46

回答 1 已采纳 It hasn't so much to do with IBM-850, that's just a codepage filling out the 8th bit beyond ASCII
字符串化，编码和解码 - 为什么？ ajax javascript json php
2015-09-02 17:42

回答 1 已采纳 First, AJAX is not tied to JSON but is the most used. You could use XML, yaml or you own format. O
php中utf8编码,php中utf-8编码解决十法
2021-03-25 08:36

小小奋斗nice的博客 php用utf-8编程的总结：1、php文件本身必须是utf-8编码。不像java会生成class文件，避免这个问题。2、php要输出头：header(”content-type: text/html; charset=utf-8″)3、meta标签无所谓，有header所有浏览器就会...
PHP：csv文件编码？ php
2013-06-11 10:27

回答 1 已采纳 Your guess it correct: file -i NL_JGFR_130326_bac.csv NL_JGFR_130326_bac.csv: text/plain; charse
如何在PHP中将m3u8转换为base64编码？ php
2019-06-30 08:51

回答 1 已采纳 it should work like this: <script> var player = new Clappr.Player({ source: window.atob
为什么php文件编码影响响应数据编码？ json php
2017-02-09 05:10

回答 1 已采纳 PHP expects scripts to conform to the Unicode standard. According to the Unicode standard, the BOM
web安全-PHP-url编码取反绕过正则-思考
2022-03-09 21:34

Red snow的博客 web安全-PHP-url编码取反绕过正则-思考
特殊äö字符打破UTF-8编码 php
2019-02-28 13:22

回答 1 已采纳 It's not that these characters have broken the encoding, it's just that Unicode is really complica
PHP检测文件编码格式
2017-07-25 15:36

Nassir1995的博客 7/25/2017 PM 实现思路通过file_get_contents()将源文件强制转码，再将结果与原文件比较，若相同则强制转码所使用的编码方式便是源文件的编码方式。
php查看字符编码,PHP实现检测当前字符编码并转码的方法
2021-04-04 08:19

朵儿来啦的博客本文主要和大家分享PHP实现检测当前字符编码并转码的方法，结合文字和代码，希望能帮助到大家。一、检测当前字符串编码并将编码改为utf-81 获取当前字符串的编码$encode = mb_detect_encoding($str, array("ASCII",'...
没有解决我的问题, 去提问

悬赏问题

¥15 虚幻5 UE美术毛发渲染
¥15 CVRP 图论物流运输优化
¥15 Tableau online 嵌入ppt失败
¥100 支付宝网页转账系统不识别账号
¥15 基于单片机的靶位控制系统
¥15 真我手机蓝牙传输进度消息被关闭了，怎么打开？(关键词-消息通知)
¥15 装 pytorch 的时候出了好多问题，遇到这种情况怎么处理？
¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
¥15 手机接入宽带网线，如何释放宽带全部速度
¥30 关于#r语言#的问题：如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测

PHP - 如何检测编码？

1条回答 默认 最新

Edit :

悬赏问题

1条回答默认最新