drl47263 2014-06-18 19:41
浏览 579
已采纳

Golang解码/解组JSON中的无效unicode

I am fetching JSON files in go that are not formatted homogeneously. For Example, I can have the following:

{"email": "\"blah.blah@blah.com\""}
{"email": "robert@gmail.com"}
{"name": "m\303\203ead"}

We can see that there will be a problem with the escaping character. Using json.Decode:

With:

{"name": "m\303\203ead"}

I get the error: invalid character '3' in string escape code

I have tried several approaches to normalise my data for example by passing by a string array (it works but there is too many edge cases), or even to filter escape characters.

Finally, I came through this article: (http://blog.golang.org/normalization) And the solution they proposed seemed very interesting.

I have tried the following

isMn := func(r rune) bool {
    return unicode.Is(unicode.Mn, r)
}

t := transform.Chain(norm.NFC, transform.RemoveFunc(isMn), norm.NFD)

fileReader, err := bucket.GetReader(filename)

transformReader := transform.NewReader(fileReader, t)

decoder := json.NewDecoder(tReader)

for {
    var dataModel Model
    if err := decoder.Decode(&kmData); err == io.EOF {
        break
    } else {
      // DO SOMETHING
    }
}

With Model being:

type Model struct {
    Name  string `json:"name" bson:"name"`
    Email string `json:"email" bson:"email"` 
}

I have tried several variations of it, but haven't been able to have it working.

So my question is how to easily handle decoding/unmarshaling JSON data with different encodings? Knowing, that I have no control on those JSON files.

If you are reading this, thank you anyway.

  • 写回答

1条回答 默认 最新

  • douhu2898 2014-06-18 21:40
    关注

    You can use json.RawMessage instead of string, that way json.Decode won't try to decode the invalid characters.

    playground : http://play.golang.org/p/fB-38KGAO0

    type Model struct {
        N  json.RawMessage `json:"name" bson:"name"`
    }
    
    func (m *Model) Name() string {
        return string(m.N)
    }
    func main() {
        s := "{\"name\": \"m\303\203ead\"}"
        r := strings.NewReader(s)
        d := json.NewDecoder(r)
        m := Model{}
    
        fmt.Println(d.Decode(&m))
        fmt.Println(m.Name())
    }
    

    Edit: Well, you can use regex, not sure how viable that is for you http://play.golang.org/p/VYJKTKmiYm:

    func cleanUp(s string) string {
        re := regexp.MustCompile(`\b(\\\d\d\d)`)
        return re.ReplaceAllStringFunc(s, func(s string) string {
            return `\u0` + s[1:]
        })
    }
    func main() {
        s := "{\"name\": \"m\303\203ead\"}"
        s = cleanUp(s)
        r := strings.NewReader(s)
        d := json.NewDecoder(r)
        m := Model{}
        fmt.Println(d.Decode(&m))
        fmt.Println(m.Name())
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥65 永磁型步进电机PID算法
  • ¥15 sqlite 附加(attach database)加密数据库时,返回26是什么原因呢?
  • ¥88 找成都本地经验丰富懂小程序开发的技术大咖
  • ¥15 如何处理复杂数据表格的除法运算
  • ¥15 如何用stc8h1k08的片子做485数据透传的功能?(关键词-串口)
  • ¥15 有兄弟姐妹会用word插图功能制作类似citespace的图片吗?
  • ¥200 uniapp长期运行卡死问题解决
  • ¥15 latex怎么处理论文引理引用参考文献
  • ¥15 请教:如何用postman调用本地虚拟机区块链接上的合约?
  • ¥15 为什么使用javacv转封装rtsp为rtmp时出现如下问题:[h264 @ 000000004faf7500]no frame?