YaoRaoLov 2018-11-05 04:33 采纳率: 50%
浏览 228

C Unicode: 我如何应用 C11标准修正案 DR488修正 C11标准功能 c16rtomb () ?

Question:

As mentioned in the C reference page for the function, c16rtomb, from CPPReference, under the Notes section:

In C11 as published, unlike mbrtoc16, which converts variable-width multibyte (such as UTF-8) to variable-width 16-bit (such as UTF-16) encoding, this function can only convert single-unit 16-bit encoding, meaning it cannot convert UTF-16 to UTF-8 despite that being the original intent of this function. This was corrected by the post-C11 defect report DR488.

And below this passage, the C reference page provided an example source code with the following sentence above it:

Note: this example assumes the fix for the defect report 488 is applied.

That phrase implied there is a way to take DR488 and somehow "apply" the fix to the C11 standard function, c16rtomb.

I would like to know how to apply the fix for GCC. Because it seems to me the fix was already applied to Visual Studio 2017 Visual C++, as of v141.

The behavior seen in GCC, when debugging the code in GDB, is consistent with what was found in DR488, as follows:

Section 7.28.1 describes the function c16rtomb(). In particular, it states "When c16 is not a valid wide character, an encoding error occurs". "wide character" is defined in section 3.7.3 as "value representable by an object of type wchar_t, capable of representing any character in the current locale". This wording seems to imply that, e.g. for the common cases (e.g, an implementation that defines __STDC_UTF_16__ and a program that uses an UTF-8 locale), c16rtomb() will return -1 when it encounters a character that is encoded as multiple char16_t (for UTF-16 a wide character can be encoded as a surrogate pair consisting of two char16_t). In particular, c16rtomb() will not be able to process strings generated by mbrtoc16().

The boldfaced text is the behavior described.

Source code:

#include <stdio.h>
#include <uchar.h>

#define __STD_UTF_16__

int main() {
    char16_t* ptr_string = (char16_t*) u"我是誰";

    //C++ disallows variable-length arrays. 
    //GCC uses GNUC++, which has a C++ extension for variable length arrays.
    //It is not a truly standard feature in C++ pedantic mode at all.
    //https://stackoverflow.com/questions/40633344/variable-length-arrays-in-c14
    char buffer[64];
    char* bufferOut = buffer;

    //Must zero this object before attempting to use mbstate_t at all.
    mbstate_t multiByteState = {};

    //c16 = 16-bit Characters or char16_t typed characters
    //r = representation
    //tomb = to Multi-Byte Strings
    while (*ptr_string) {
        char16_t character = *ptr_string;
        size_t size = c16rtomb(bufferOut, character, &multiByteState);
        if (size == (size_t) -1)
            break;
        bufferOut += size;
        ptr_string++;
    }

    size_t bufferOutSize = bufferOut - buffer;
    printf("Size: %zu - ", bufferOutSize);
    for (int i = 0; i < bufferOutSize; i++) {
        printf("%#x ", +(unsigned char) buffer[i]);
    }

    //This statement is used to set a breakpoint. It does not do anything else.
    int debug = 0;
    return 0;
}

Output from Visual Studio:

Size: 9 - 0xe6 0x88 0x91 0xe6 0x98 0xaf 0xe8 0xaa 0xb0

Output from GCC:

Size: 0 -

转载于:https://stackoverflow.com/questions/53148386/c-unicode-how-do-i-apply-c11-standard-amendment-dr488-fix-to-c11-standard-funct

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 求解答一道线性规划题,用lingo编程运行,第一问要求写出数学模型和lingo语言编程模型,第二问第三问解答就行,我的ddl要到了谁来求了
    • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
    • ¥50 树莓派安卓APK系统签名
    • ¥15 maple软件,用solve求反函数出现rootof,怎么办?
    • ¥65 汇编语言除法溢出问题
    • ¥15 Visual Studio问题
    • ¥20 求一个html代码,有偿
    • ¥100 关于使用MATLAB中copularnd函数的问题
    • ¥20 在虚拟机的pycharm上
    • ¥15 jupyterthemes 设置完毕后没有效果