UTF-8字符串的解码会破坏一个字符串，但不会损坏另一个字符串

I'm having a very strange error.

I have verified that both my strings are UTF-8 (Checked through mb_check_encoding and mb_detect_encoding) but when I attempt to use utf8_decode on the string, it will return garbage characters to me. In this case, I actually do not need to use utf8_decode and the string will be normal.

The difficulty is that I have customers using UTF-8 databases that I pull strings from and I use utf8_decode to ungarble the strings for PHP. If I don't the space characters will be replaced with Ã . They share the same code to generate the string, but for some reason when I generate it for this other customer, the strings come out all wrong.

Is there a way for me to verify that I will need to use utf8_decode other than the fact that the string is utf 8?

Some Examples:

Using utf8_decode for customer 1:
?0,107�per�km
Without utf8_decode for customer 1:
€0,107 per km

Using utf8_decode for customer 2:
$7.00 per km
Without utf8_decode for customer 2:
$7.00Â perÂ km

Thanks guys!

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duanhong8839 2013-07-17 19:16
关注
mb_detect_encoding without an informed detect_order is no silver bullet, as this would demonstrate:

$ php -r 'echo mb_detect_encoding(iconv("utf-8","iso-8859-1","ë"));' UTF-8

Obviously wrong, setting it to strict helps a little bit:

$ php -r 'var_dump(mb_detect_encoding(iconv("utf-8","iso-8859-1","ë"),mb_detect_order(),true));' bool(false)

Why is it false? Well, let's examine the possible character sets mb_detect_encoding() uses in my configuration:

$ php -r 'var_dump(mb_detect_order());' array(2) { [0] => string(5) "ASCII" [1] => string(5) "UTF-8" }

Well, save for ASCII & UTF-8, no other character set will be detected. Jon has a point though: you can store it all as utf-8, and with the proper database settings, or even only just a correct character_set_results in a mysql (which I assume you use...) connection would do the trick to retrieve it as utf-8 regardless of how it's stored. However, if this is not an option for whatever reason I can't think of, it's up to you to specificy which character sets are possible for mb_detect_order.

$ php -r 'echo mb_detect_encoding(iconv("utf-8","iso-8859-1","ë"),"ASCII,UTF-8,ISO-8859-1,JIS", true);' ISO-8859-1

In short: you are responsible for providing a list of possible character sets, and if you already have that kind of information... you can probably know the character set (by connection settings, database/table settings, or even just client-configuration, etc.) rather then to try to detect it.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

UTF-8字符串的解码会破坏一个字符串，但不会损坏另一个字符串 php
2013-07-17 17:52

回答 1 已采纳 mb_detect_encoding without an informed detect_order is no silver bullet, as this would demonstrate
utf-8中的错误解码字符串中的JSON解码器 json
2017-03-05 12:52

回答 1 已采纳 main import ( "fmt" "encoding/json" "strings" ) type Request struct { Arg st
同一个base64字符串PHP和JAVA decode结果不一样求对应的PHP代码 java php 有问必答
2021-07-22 09:38

回答 3 已采纳 java base64加密时用ISO_8859_1编码加密看下？？php的好像是ASCII码
java字符串转成utf-8_将字符串的编码格式转换为utf-8
2021-02-13 00:31

Duo小妖的博客方式一：/*** 将字符串的编码格式转换为utf-8** @param str* @return Name = new* String(Name.getBytes("ISO-8859-1"), "utf-8");*/public static String toUTF8(String str) {if (isEmpty(str)) {return "";}try {...
我有一个utf8编码的十六进制数据，请问如何用c++解码出这个中文字符串？ c++ 有问必答
2022-01-26 12:22

回答 2 已采纳 0的部分是被加密了。没法解码。
如何解码这个时间字符串 javascript 前端
2022-01-14 14:46

回答 4 已采纳通过正则表达式提取各个部分（年、月、日等），将其转换为ISO 8601格式，将其解析为Date实例，然后添加 80 分钟 const str = "20220112201146" const rx
从php中的已解码字符串返回已创建文件的名称 php
2018-06-11 13:06

回答 1 已采纳 You are already naming the file yourself. Just store it in a variable before you write the file.
python设置字符串为utf8_如何在Python中将字符串转换为utf-8
2020-12-03 22:48

weixin_39523625的博客我有一个浏览器，它向我的Python服务器发送utf-8字符，但是当我从查询字符串中检索它时，Python返回的编码是ASCII。如何将纯字符串转换为utf-8？注意：从Web传递的字符串已经是UTF-8编码的，我只想让Python将其视为...
“你正在尝试解码一个无效的JSON字符串”但它是一个PHP ajax php
2014-09-01 14:17

回答 1 已采纳 To run the PHP code which did work under ExtJS 4 you must either modify your PHP to return the dat
在PHP中解码字符串时出现意外行为（来自AJAX POST调用） ajax javascript php
2016-07-27 13:42

回答 1 已采纳 Please try this: $content = file_get_contents('php://input'); $content = mb_convert_encoding($con
如何编码/解码一个空字符串
2014-12-04 05:34

回答 1 已采纳 The problem isn't the encoding/gob module, but instead the custom MarshalBinary/UnmarshalBinary me
Python 中的字符串、字节串和字符编码utf-8（个人笔记）
2021-08-21 12:51

刘墨苏的博客文章目录Python 中的字符串、字节串和字符编码utf-8（个人笔记）前言UTF-8补充字符串编码字节串解码一些字符编解码技巧unicode数字转换为字符字符转换为unicode数字字符串编码为 unicode转义数字用unicode转义数字 ...
php解析字符串而不解码 php
2018-07-01 17:19

回答 2 已采纳 You could essentially replicate parse_str but without applying urldecode: $x = $_SERVER['QUERY_ST
Labview 字符串和UTF8的相互转换
2019-09-03 14:00

在做Labview和tcp通讯的时候，需要发送中文字符串，找了会相关资料，竟然找到了labview提供的现成的字符串到utf8相互转换的vi，整理了一下分享出来，2014环境下目前测试可以直接使用。原文...
php字符串转换url编码格式,php – 将URL编码的字符串转换为UTF-8
2021-04-12 14:03

EDFUS的博客昨天我遇到了从URL获取一些Unicode字符串的问题.实际上我使用CodeIgniter,URL段将被传递到控制器/函数(参数).我不知道CI是否改变了编码,或者它是其他的东西.我在我的HTML页面中的内容和地址栏中都有正确的编码,直到...
没有解决我的问题, 去提问

悬赏问题

¥15 c语言怎么用printf（“\b \b”）与getch（）实现黑框里写入与删除？
¥20 怎么用dlib库的算法识别小麦病虫害
¥15 华为ensp模拟器中S5700交换机在配置过程中老是反复重启
¥15 java写代码遇到问题，求帮助
¥15 uniapp uview http 如何实现统一的请求异常信息提示？
¥15 有了解d3和topogram.js库的吗？有偿请教
¥100 任意维数的K均值聚类
¥15 stamps做sbas-insar，时序沉降图怎么画
¥15 买了个传感器，根据商家发的代码和步骤使用但是代码报错了不会改，有没有人可以看看
¥15 关于#Java#的问题，如何解决？

UTF-8字符串的解码会破坏一个字符串，但不会损坏另一个字符串

1条回答 默认 最新

悬赏问题

1条回答默认最新