douzhiji2020 2016-08-23 22:24
浏览 62
已采纳

Golang Stdin读错了德国变音符

I'm from germany so I use umlauts like ä, ö and ü. Golang however doesn't read them correctly from stdin.

When I execute this simple program:

package main

import (
    "bufio"
    "fmt"
    "os"
)

func main() {
    for {
        b, _, _ := bufio.NewReader(os.Stdin).ReadLine()
        printBytes(b)
    }

}

func printBytes(bytes []byte) {
    for _, b := range bytes {
        fmt.Printf("0x%X ", b)
    }
    fmt.Println()
}

I get the output:

C:\dev\golang>go run test.go
ä
0xE2 0x80 0x9E

E2 80 9E isn't the correct byte sequence for the ä in UTF-8 (this tool tells me it's an "DOUBLE LOW-9 QUOTATION MARK" -> ) and when I just print out what I've read it prints ". I've written a small "hack" which seems to read the characters correct:

package main

/*
#include <stdio.h>
#include <stdlib.h>

char * getline(void) {
    char * line = malloc(100), * linep = line;
    size_t lenmax = 100, len = lenmax;
    int c;

    if(line == NULL)
        return NULL;

    for(;;) {
        c = fgetc(stdin);
        if(c == EOF)
            break;

        if(--len == 0) {
            len = lenmax;
            char * linen = realloc(linep, lenmax *= 2);

            if(linen == NULL) {
                free(linep);
                return NULL;
            }
            line = linen + (line - linep);
            linep = linen;
        }

        if((*line++ = c) == '
')
            break;
    }
    *line = '\0';
    return linep;
}

void freeline(char* ptr) {
    free(ptr);
}
*/
import "C"

import (
    "fmt"
    "golang.org/x/text/encoding/charmap"
)

func getLineFromCp850() string {
    line := C.getline()
    goline := C.GoString(line)
    C.freeline(line)
    b := []byte(goline)
    ub, _ := charmap.CodePage850.NewDecoder().Bytes(b)
    return string(ub)
}

func main() {
    for {
        line := getLineFromCp850()
        printBytes([]byte(line))
    }

}

func printBytes(bytes []byte) {
    for _, b := range bytes {
        fmt.Printf("0x%X ", b)
    }
    fmt.Println()
}

And it prints out:

C:\dev\golang>go run test.go
ä
0xC3 0xA4 0xA

C3 A4 is the correct bytesequence for the ä (0A is the linefeed which my hack doesn't strip) so it seems like, reading and converting from CP850 to UTF-8 does the job, as I expected, but why does Go give me gibberish when I read the line using Go's functionality instead of cgo? Whats wrong with Go that it gives me those values, doesn't it interpret the input bytes as CP850 but another charset? Is there a better Go-only way to handle this problem?

This problem only arises when reading from stdin. When I print out a UTF-8 ä to stdout it prints correctly in the console.

  • 写回答

1条回答 默认 最新

  • dsieyx2015 2016-08-25 00:42
    关注

    So it was a bug in Golang for some systems, to be specific for Windows systems where the overall used charset and the console charset were different (Where GetACP() and GetConsoleCP() from WinAPI returned different things). In Germany, for example, (and maybe other west-european countries), Windows uses the codepage 1252 as the overall-charset but it uses codepage 850 for the console cmd.exe. Not sure why, but thats how it is. Golang wrongly used GetACP() to decode the input to UTF-8 when it really should've used the codepage returned by GetConsoleCP(). We found the problem in the Issue I created and we'll hopefully see the fix merged for the next version of Golang.

    We also found a problem on Windows where Golang decoded characters to decomposed UTF-8 characters (i.e. it would read a ä to the character a followed by the COMBINING DIAERESIS ̈) which could lead to other problems, for example printing those decomposed characters prints them separate instead of one combined character.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 机器学习能否像多层线性模型一样处理嵌套数据
  • ¥20 西门子S7-Graph,S7-300,梯形图
  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题
  • ¥15 matlab不知道怎么改,求解答!!
  • ¥15 永磁直线电机的电流环pi调不出来
  • ¥15 用stata实现聚类的代码
  • ¥15 请问paddlehub能支持移动端开发吗?在Android studio上该如何部署?
  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效