duanou2016 2019-01-29 17:25
浏览 133
已采纳

如果我在PHP中将UTF-8编码的字符串与ASCII字符串连接,那么结果字符串的编码是什么?

If I use the function mb_convert_encoding() to convert an ASCII encoded string in PHP to a UTF-8 string, then concatenate it with an ASCII encoded string, what encoding is it? Are there any negative consequences for doing this?

  • 写回答

2条回答 默认 最新

  • dongzhong2674 2019-01-30 08:20
    关注

    It would depend firstly on whether you mean strict ASCII, which only includes 128 characters. Every single one of these characters has the exact same encoding in the ASCII encoding scheme as it does in the UTF-8 encoding scheme. For these characters, the mb_convert_encoding function will have no effect. You can easily verify this yourself with this script:

    /* Convert ASCII to UTF-8 */
    for ($i=0; $i<128; $i++) {
            $str1 = chr($i);
            $str2 = mb_convert_encoding($str1, "UTF-8", "ASCII");
    
            echo $str1 . " - " . $str2 . " - ";
    
            if ($str1 !== $str2) {
                    echo " - DIFFERENT!";
            } else {
                    echo " - same";
            }
            echo "
    ";
    }
    

    For all of these true ASCII characters, there's no point in transcoding them.

    HOWEVER, if by "ASCII" you mean extended ASCII (see here) and are talking about characters with accents and stuff, then you are getting into trouble because there is no definitive character set described by this term. You'll notice that in the list of supported character encodings for php's Multibyte String extension there is only one occurrence of the acronym ASCII and that is for ASCII itself.

    To answer your questions more precisely:

    If I use the function mb_convert_encoding() to convert an ASCII encoded string in PHP to a UTF-8 string, then concatenate it with an ASCII encoded string, what encoding is it?

    The resulting string is both ASCII and UTF-8 because both encoding schemes use identical byte encodings for those 128 characters.

    Are there any negative consequences for doing this?

    There should be no negative consequences under any circumstance if the characters are in fact true ASCII characters.

    If, on the other hand, the strings include some accented character like Å or õ and some sloppy coder is calling this "extended ASCII" then you might have problems. Those characters have different encodings in the latin-1 and UTF-8 encoding schemes, for instance.

    Consider taking a peek at this php function and it may shake loose some understanding. Ask yourself what it means to convert a character which is NOT ASCII from ASCII to UTF-8. It is not a meaningful conversion but it does result in a change in this particular script:

    $chars = array("Å", "õ");
    foreach ($chars as $char) {
            echo $char . " : ";
            $str1 = mb_convert_encoding($str1, "UTF-8", "ASCII");
            $str2 = mb_convert_encoding($str1, "UTF-8", "ISO-8859-1");
            echo $str1 . " - " . $str2 . " - ";
    
            if ($char !== $str1) {
                    echo " - ASCII DIFFERENT";
            }
            if ($char !== $str2) {
                    echo " - LATIN 1 DIFFERENT";
            }
            echo "
    ";
    }
    

    You might start to get confused at this point. It might help for you to know that my PHP code in that last function has its own character encoding which on my workstation happens to be utf-8. These transformations I've performed are therefore pretty stupid. I'm lying to PHP, saying that these UTF-8 strings are ASCII or Latin-1 and asking PHP to transform them to UTF-8. It performs a transformation as best it can but we all know that transformation isn't meaningful.

    I hope you can appreciate what I'm getting at here. Every time you see a character on a computer, it has some encoding. Whether or not there are any negative consequences will depend on how you treat the data that comes to you, what transformations you perform on it, and what you intend to do with it later.

    It's helpful to think of a chain of custody. Where did your data come from? What encoding did they use? Is that what I'm using on my system? Where am I sending this data? Does it need to be converted? You should also be careful to specify character sets for all these things:

    • data you receive from clients
    • form submissions to your website
    • display of html on your website
    • operations on text strings in your applications
    • character encoding of your connection to a database, character encoding of the tables in your db and encodings of the columns in those tables
    • character encoding of stored data
    • email character encoding
    • character encoding of data submitted to an API

    And so on.

    General rule of thumb: use utf-8 for everything you possibly can.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 javaweb项目无法正常跳转
  • ¥15 VMBox虚拟机无法访问
  • ¥15 skd显示找不到头文件
  • ¥15 机器视觉中图片中长度与真实长度的关系
  • ¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
  • ¥15 R语言卸载之后无法重装,显示电脑存在下载某些较大二进制文件行为,怎么办
  • ¥15 java 的protected权限 ,问题在注释里
  • ¥15 这个是哪里有问题啊?
  • ¥15 关于#vue.js#的问题:修改用户信息功能图片无法回显,数据库中只存了一张图片(相关搜索:字符串)
  • ¥15 texstudio的问题,