douzhi1972 2018-11-04 05:07
浏览 134

使用不可打印的ASCII字符解组JSON

Using Go, how can I unmarshal a JSON string that contains unprintable ASCII characters?

For Example

testJsonString := "{\"test_one\" : \"123\x10456\x0B789\v123\a456\"}"
var dat map[string]interface{}
err := json.Unmarshal([]byte(testJsonString), &dat)
if err != nil {
    panic(err)
}

Yields:

panic: invalid character '\x10' in string literal

goroutine 1 [running]:
main.main()
    /tmp/sandbox903140350/main.go:14 +0x180

https://play.golang.org/p/mFGWzndDK8V

Unfortunately I do not have control over the source data, so I need a way to ignore or strip out the unprintable characters.

Similarly, another data issue I'm encountering is stripping out a few C escape sequences as well - like \0 and \a. If I replace string listed above with this string below, the program fails as well. Essentially it also fails on any C escape sequence https://en.wikipedia.org/wiki/Escape_sequences_in_C

testJsonString := "{\"test_one\" : \"123456789\\a123456\"}"

will error out with

panic: invalid character 'a' in string escape code

goroutine 1 [running]:
main.main()
    /tmp/sandbox322770276/main.go:12 +0x100

This also seems to not be able to be unmarshaled, but is not able to be escaped through rune number checking or checking the unicode (since Go appears to treat it as a backslash followed by the character 'a', which are both legal)

Is there a good way to handle these edge cases?

  • 写回答

1条回答 默认 最新

  • douzhannao5357 2018-11-04 08:26
    关注

    According to the JSON spec https://jsonapi.org/format/ non printable characters should be URI escaped (or converted to valid unicode escapes)

    So here's a converter that makes non printable characters into their uri escaped forms. These can then be fed into the Unmarshal

    If this isn't exactly the behaviour you need then modify the converter to remove the characters (with continue) or replace with a question mark rune or whatever

    BTW, the second problem with \\a does not "print out as expected" for me. Please give a better example that actually shows the problem you are experiencing

        package main
    
        import (
            "bytes"
            "encoding/json"
            "fmt"
            "unicode"
            "net/url"
        )
    
    func safety(d string) []byte {
        var buffer bytes.Buffer
        for _, c := range d {
            s := string(c)
            if c == 92 { // 92 is a backslash
              continue
            }
            if unicode.IsPrint(c) {        
                buffer.WriteString(s)
            } else {
                buffer.WriteString(url.QueryEscape(s))
            }
            fmt.Println(buffer.String())
        }
        return buffer.Bytes()
    }
    
    func main() {
        testJsonString := "{\"test_one\" : \"123\x10456\x0B789\v123\a456\"}"
        var dat map[string]interface{}
        err := json.Unmarshal(safety(testJsonString), &dat)
        if err != nil {
            panic(err)
        }
        fmt.Printf("%v", dat)
    }
    
    评论

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么