I'm using goyaml
as a YAML beautifier. By loading and dumping a YAML file, I can source-format it. I unmarshal the data from a YAML source file into a struct, marshal those bytes, and write the bytes to an output file. But the process morphs my Unicode strings into the literal version of the quoted strings, and I don't know how to reverse it.
Example input subtitle.yaml
:
line: 你好
I've stripped everything down to the smallest reproducible problem. Here's the code, using _
to catch errors which don't pop-up:
package main
import (
"io/ioutil"
//"unicode/utf8"
//"fmt"
"gopkg.in/yaml.v1"
)
type Subtitle struct {
Line string
}
func main() {
filename := "subtitle.yaml"
in, _ := ioutil.ReadFile(filename)
var subtitle Subtitle
_ = goyaml.Unmarshal(in, &subtitle)
out, _ := goyaml.Marshal(&subtitle)
//for len(out) > 0 { // For debugging, see what the runes are
// r, size := utf8.DecodeRune(out)
// fmt.Printf("%c ", r)
// out = out[size:]
//}
_ = ioutil.WriteFile(filename, out, 0644)
}
Actual output subtitle.yaml
:
line: "\u4F60\u597D"
I want to reverse the weirdness in goyaml
after I get the variable out
.
The commented-out rune-printing code block, which adds spaces between runes for clarity, outputs the following. It shows that Unicode runes like 你
aren't being decoded, but treated literally:
l i n e : " \ u 4 F 6 0 \ u 5 9 7 D "
How can I unquote out
, before writing it to the output file, so that the output looks like the input (albeit beautified)?
Desired output subtitle.yaml
:
line: "你好"
Temporary Solution
I've filed https://github.com/go-yaml/yaml/issues/11. In the meantime, @bobince's tip on yaml_emitter_set_unicode
was helpful in unconvering the problem. It was defined as a C binding but never called (or given an option to set it)! I changed encode.go
and added yaml_emitter_set_unicode(&e.emitter, true)
to line 20, and everything works as expected. It would be better to make it optional, but that would require a change in the Marshal API.