drgm51600 2016-05-31 18:05
浏览 262
已采纳

在Golang中解码XML时进行自定义字符串转换

I am decoding some XML which contains only string values and attributes. It also contains a few instances of "&", which is unfortunate, and I'd like to decode that to just "&" rather than "&". I'm also going to do some more work with these string values in which I need the character "|" to never appear, and so I'd like to replace any "|" instance with "%7C".

I could do these changes using strings.Replace after the decoding, but since the decoding is already doing similar work (after all it does translate "&" to "&") I'd like to do it at the same time.

The files I will be parsing are huge, so I'll be doing something similar to http://blog.davidsingleton.org/parsing-huge-xml-files-with-go/

Here is a short example xml file:

<?xml version="1.0" encoding="utf-8"?>
<tests>
    <test_content>X&amp;amp;Y is a dumb way to write XnY | also here's a pipe.</test_content>
    <test_attr>
      <test name="Normal" value="still normal" />
      <test name="X&amp;amp;Y" value="should be the same as X&amp;Y | XnY would have been easier." />
    </test_attr>
</tests>

And some Go code that does standard decoding and prints out the results:

package main

import (
    "encoding/xml"
    "fmt"
    "os"
)

type XMLTests struct {
    Content string     `xml:"test_content"`
    Tests   []*XMLTest `xml:"test_attr>test"`
}

type XMLTest struct {
    Name  string `xml:"name,attr"`
    Value string `xml:"value,attr"`
}

func main() {
    xmlFile, err := os.Open("test.xml")
    if err != nil {
        fmt.Println("Error opening file:", err)
        return
    }
    defer xmlFile.Close()

    var q XMLTests

    decoder := xml.NewDecoder(xmlFile)

    // I tried this to no avail:
    // decoder.Entity = make(map[string]string)
    // decoder.Entity["|"] = "%7C"
    // decoder.Entity["&amp;amp;"] = "&"

    var inElement string
    for {
        t, _ := decoder.Token()
        if t == nil {
            break
        }
        switch se := t.(type) {
        case xml.StartElement:
            inElement = se.Name.Local
            if inElement == "tests" {
                decoder.DecodeElement(&q, &se)
            }
        default:
        }
    }

    fmt.Println(q.Content)
    for _, t := range q.Tests {
        fmt.Printf("\t%s\t\t%s
", t.Name, t.Value)
    }
}

How do I modify this code to get what I want? ie: How does one customize the decoder?

I looked at the docs, specifically https://golang.org/pkg/encoding/xml/#Decoder and tried playing with the Entity map, but I was unable to make any progress.

Edit:

Based on the comments, I've followed the example from Multiple-types decoder in golang and added/changed the following to the above code:

type string2 string

type XMLTests struct {
    Content string2    `xml:"test_content"`
    Tests   []*XMLTest `xml:"test_attr>test"`
}

type XMLTest struct {
    Name  string2 `xml:"name,attr"`
    Value string2 `xml:"value,attr"`
}

func (s *string2) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
    var content string
    if err := d.DecodeElement(&content, &start); err != nil {
        return err
    }
    content = strings.Replace(content, "|", "%7C", -1)
    content = strings.Replace(content, "&amp;", "&", -1)
    *s = string2(content)
    return nil
}

That works for the test_content but not for the attributes?

X&Y is a dumb way to write XnY %7C also here's a pipe.
    Normal      still normal
    X&amp;Y     should be the same as X&Y | XnY would have been easier.
  • 写回答

1条回答 默认 最新

  • doutong2132 2016-06-06 15:23
    关注

    To deal with attributes, you can use the UnmarshalerAttr interface with the UnmarshalXMLAttr method. Your example then becomes:

    package main
    
    import (
        "encoding/xml"
        "fmt"
        "strings"
    )
    
    type string2 string
    
    type XMLTests struct {
        Content string2    `xml:"test_content"`
        Tests   []*XMLTest `xml:"test_attr>test"`
    }
    
    type XMLTest struct {
        Name  string2 `xml:"name,attr"`
        Value string2 `xml:"value,attr"`
    }
    
    func decode(s string) string2 {
        s = strings.Replace(s, "|", "%7C", -1)
        s = strings.Replace(s, "&amp;", "&", -1)
        return string2(s)
    }
    
    func (s *string2) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
        var content string
        if err := d.DecodeElement(&content, &start); err != nil {
            return err
        }
        *s = decode(content)
        return nil
    }
    
    func (s *string2) UnmarshalXMLAttr(attr xml.Attr) error {
        *s = decode(attr.Value)
        return nil
    }
    
    func main() {
        xmlData := `<?xml version="1.0" encoding="utf-8"?>
    <tests>
        <test_content>X&amp;amp;Y is a dumb way to write XnY | also here's a pipe.</test_content>
        <test_attr>
          <test name="Normal" value="still normal" />
          <test name="X&amp;amp;Y" value="should be the same as X&amp;Y | XnY would have been easier." />
        </test_attr>
    </tests>`
        xmlFile := strings.NewReader(xmlData)
    
        var q XMLTests
    
        decoder := xml.NewDecoder(xmlFile)
        decoder.Decode(&q)
    
        fmt.Println(q.Content)
        for _, t := range q.Tests {
            fmt.Printf("\t%s\t\t%s
    ", t.Name, t.Value)
        }
    }
    

    Output:

    X&Y is a dumb way to write XnY %7C also here's a pipe.
        Normal      still normal
        X&Y     should be the same as X&Y %7C XnY would have been easier.
    

    (You can test this in the Go playground.)

    So if using string2 everywhere is suitable for you, this should do the trick.

    (edit: simpler code, without using DecodeElement and a type switch...)

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 WPF动态创建页面内容
  • ¥15 如何对TBSS的结果进行统计学的分析已完成置换检验,如何在最终的TBSS输出结果提取除具体值及如何做进一步相关性分析
  • ¥15 SQL数据库操作问题
  • ¥100 关于lm339比较电路出现的问题
  • ¥15 Matlab安装yalmip和cplex功能安装失败
  • ¥15 加装宝马安卓中控改变开机画面
  • ¥15 STK安装问题问问大家,这种情况应该怎么办
  • ¥15 关于罗技鼠标宏lua文件的问题
  • ¥15 halcon ocr mlp 识别问题
  • ¥15 已知曲线满足正余弦函数,根据其峰值,还原出整条曲线