如果我在PHP中将UTF-8编码的字符串与ASCII字符串连接，那么结果字符串的编码是什么？

If I use the function mb_convert_encoding() to convert an ASCII encoded string in PHP to a UTF-8 string, then concatenate it with an ASCII encoded string, what encoding is it? Are there any negative consequences for doing this?

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongzhong2674 2019-01-30 00:20
关注
It would depend firstly on whether you mean strict ASCII, which only includes 128 characters. Every single one of these characters has the exact same encoding in the ASCII encoding scheme as it does in the UTF-8 encoding scheme. For these characters, the mb_convert_encoding function will have no effect. You can easily verify this yourself with this script:

/* Convert ASCII to UTF-8 */ for ($i=0; $i<128; $i++) { $str1 = chr($i); $str2 = mb_convert_encoding($str1, "UTF-8", "ASCII"); echo $str1 . " - " . $str2 . " - "; if ($str1 !== $str2) { echo " - DIFFERENT!"; } else { echo " - same"; } echo " "; }

For all of these true ASCII characters, there's no point in transcoding them.

HOWEVER, if by "ASCII" you mean extended ASCII (see here) and are talking about characters with accents and stuff, then you are getting into trouble because there is no definitive character set described by this term. You'll notice that in the list of supported character encodings for php's Multibyte String extension there is only one occurrence of the acronym ASCII and that is for ASCII itself.

To answer your questions more precisely:

If I use the function mb_convert_encoding() to convert an ASCII encoded string in PHP to a UTF-8 string, then concatenate it with an ASCII encoded string, what encoding is it?

The resulting string is both ASCII and UTF-8 because both encoding schemes use identical byte encodings for those 128 characters.

Are there any negative consequences for doing this?

There should be no negative consequences under any circumstance if the characters are in fact true ASCII characters.

If, on the other hand, the strings include some accented character like Å or õ and some sloppy coder is calling this "extended ASCII" then you might have problems. Those characters have different encodings in the latin-1 and UTF-8 encoding schemes, for instance.

Consider taking a peek at this php function and it may shake loose some understanding. Ask yourself what it means to convert a character which is NOT ASCII from ASCII to UTF-8. It is not a meaningful conversion but it does result in a change in this particular script:

$chars = array("Å", "õ"); foreach ($chars as $char) { echo $char . " : "; $str1 = mb_convert_encoding($str1, "UTF-8", "ASCII"); $str2 = mb_convert_encoding($str1, "UTF-8", "ISO-8859-1"); echo $str1 . " - " . $str2 . " - "; if ($char !== $str1) { echo " - ASCII DIFFERENT"; } if ($char !== $str2) { echo " - LATIN 1 DIFFERENT"; } echo " "; }

You might start to get confused at this point. It might help for you to know that my PHP code in that last function has its own character encoding which on my workstation happens to be utf-8. These transformations I've performed are therefore pretty stupid. I'm lying to PHP, saying that these UTF-8 strings are ASCII or Latin-1 and asking PHP to transform them to UTF-8. It performs a transformation as best it can but we all know that transformation isn't meaningful.

I hope you can appreciate what I'm getting at here. Every time you see a character on a computer, it has some encoding. Whether or not there are any negative consequences will depend on how you treat the data that comes to you, what transformations you perform on it, and what you intend to do with it later.

It's helpful to think of a chain of custody. Where did your data come from? What encoding did they use? Is that what I'm using on my system? Where am I sending this data? Does it need to be converted? You should also be careful to specify character sets for all these things:

data you receive from clients

form submissions to your website

display of html on your website

operations on text strings in your applications

character encoding of your connection to a database, character encoding of the tables in your db and encodings of the columns in those tables

character encoding of stored data

email character encoding

character encoding of data submitted to an API

And so on.

General rule of thumb: use utf-8 for everything you possibly can.
展开全部

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报
编辑

预览
轻敲空格完成输入
显示为

卡片

标题

链接
评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

编辑

预览

报告相同问题？

关注问题

将字符串转换成gb2312或者utf-8编码的参数(js版)
2020-10-27 05:02

在标题和描述中提到的问题，即如何将字符串转换为GB2312或UTF-8编码以便在URL中安全地传递中文参数。下面将详细介绍这个过程，并给出提供的代码实现。首先，我们需要理解GB2312和UTF-8两种编码方式。GB2312是中国...
php截取utf-8中文字符串乱码的解决方法
2020-10-28 22:11

在处理PHP中的字符串时，尤其是涉及到UTF-8编码的中文字符串时，常常会遇到字符串被错误地截取，从而产生乱码的问题。这通常是因为UTF-8编码中，一个中文字符可能由多个字节组成，简单的按字节截取可能会将一个中文...
PHP安全的URL字符串base64编码和解码
2020-10-25 10:41

在Web开发中，Base64编码是一种常用的编码技术，它将二进制数据编码成ASCII字符串格式，这使得数据能够嵌入到URL中或者作为JSON数据的一部分进行传输。然而，标准的Base64编码包含了一些在URL中不安全的字符，例如...
java ascii 转 utf8,如何在Java中将ASCII字符串转换为UTF-8字符串？
2021-02-12 14:13

weixin_39934613的博客 as titled, how do I convert an ASCII String to an UTF-8 String in Java?Thanks!Edit: My situation is really that I read in a Chinese String, and when I output it, it's all gibberish. I thought the pr.....
python设置字符串为utf8_如何在Python中将字符串转换为utf-8
2020-12-03 14:48

weixin_39523625的博客如何将纯字符串转换为utf-8？注意：从Web传递的字符串已经是UTF-8编码的，我只想让Python将其视为UTF-8而不是ASCII。试试这个链接http://evanjones.ca/python-utf8.html我认为一个更好的标题是如何在没有翻译的情况...
java把字符串转换成utf8_在Java中将字符串转换为UTF-8字节
2021-03-13 07:58

宛丘之的博客 UTF-8具有与ASCII一样的压缩能力，但也可以包含任何Unicode字符，但文件大小会有所增加。UTF代表Unicode转换格式。“ 8”表示它分配8位块来表示一个字符。表示一个字符所需的块数从1到4不等。为了将String转换为UTF-...
php中将字符串转为HTML的实体引用的一个类
2021-01-19 17:23

/** * 将非ASCII字符串转换成HTML实体 * * @example HtmlEncode::encode(“我信了”); //输出:我信了 * @param string $s 要进行编码的字符串 * @return string 返回HTML实体引用 */ public static
python输出utf-8编码的字符_在python 2.7中打印UTF-8字符
2020-12-11 03:17

weixin_39797780的博客该文件是用于Unicode字符的UTF-8编码文件。我想打印前10个UTF-8字符，但是下面代码片段的输出显示了10个无法识别的怪异字符。想知道是否有人对如何正确打印有任何想法？谢谢。with open(name, 'r') as content_...
python设置字符串为utf8,如何在Python中将字符串转换为utf-8
2020-12-09 03:47

原始寒冰菇1444的博客 I have a browser which sends utf-8 characters to my Python server, but when I retrieve it from the query string, the encoding that Python returns is ... How can I convert the plain string to utf-8?...
详解C++中的ANSI、Unicode和UTF8三种字符编码及相互转换
2024-06-08 03:58

dvlinker的博客本文详细介绍ANSI、Unicode和UTF8三种字符编码以及它们之间的相互转换，并给出了实际问题实例。
没有解决我的问题, 去提问

如果我在PHP中将UTF-8编码的字符串与ASCII字符串连接，那么结果字符串的编码是什么？

2条回答 默认 最新

2条回答默认最新