I am decoding some XML which contains only string values and attributes. It also contains a few instances of "&"
, which is unfortunate, and I'd like to decode that to just "&"
rather than "&"
. I'm also going to do some more work with these string values in which I need the character "|"
to never appear, and so I'd like to replace any "|"
instance with "%7C"
.
I could do these changes using strings.Replace
after the decoding, but since the decoding is already doing similar work (after all it does translate "&"
to "&"
) I'd like to do it at the same time.
The files I will be parsing are huge, so I'll be doing something similar to http://blog.davidsingleton.org/parsing-huge-xml-files-with-go/
Here is a short example xml file:
<?xml version="1.0" encoding="utf-8"?>
<tests>
<test_content>X&amp;Y is a dumb way to write XnY | also here's a pipe.</test_content>
<test_attr>
<test name="Normal" value="still normal" />
<test name="X&amp;Y" value="should be the same as X&Y | XnY would have been easier." />
</test_attr>
</tests>
And some Go code that does standard decoding and prints out the results:
package main
import (
"encoding/xml"
"fmt"
"os"
)
type XMLTests struct {
Content string `xml:"test_content"`
Tests []*XMLTest `xml:"test_attr>test"`
}
type XMLTest struct {
Name string `xml:"name,attr"`
Value string `xml:"value,attr"`
}
func main() {
xmlFile, err := os.Open("test.xml")
if err != nil {
fmt.Println("Error opening file:", err)
return
}
defer xmlFile.Close()
var q XMLTests
decoder := xml.NewDecoder(xmlFile)
// I tried this to no avail:
// decoder.Entity = make(map[string]string)
// decoder.Entity["|"] = "%7C"
// decoder.Entity["&amp;"] = "&"
var inElement string
for {
t, _ := decoder.Token()
if t == nil {
break
}
switch se := t.(type) {
case xml.StartElement:
inElement = se.Name.Local
if inElement == "tests" {
decoder.DecodeElement(&q, &se)
}
default:
}
}
fmt.Println(q.Content)
for _, t := range q.Tests {
fmt.Printf("\t%s\t\t%s
", t.Name, t.Value)
}
}
How do I modify this code to get what I want? ie: How does one customize the decoder?
I looked at the docs, specifically https://golang.org/pkg/encoding/xml/#Decoder and tried playing with the Entity map, but I was unable to make any progress.
Edit:
Based on the comments, I've followed the example from Multiple-types decoder in golang and added/changed the following to the above code:
type string2 string
type XMLTests struct {
Content string2 `xml:"test_content"`
Tests []*XMLTest `xml:"test_attr>test"`
}
type XMLTest struct {
Name string2 `xml:"name,attr"`
Value string2 `xml:"value,attr"`
}
func (s *string2) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
var content string
if err := d.DecodeElement(&content, &start); err != nil {
return err
}
content = strings.Replace(content, "|", "%7C", -1)
content = strings.Replace(content, "&", "&", -1)
*s = string2(content)
return nil
}
That works for the test_content
but not for the attributes?
X&Y is a dumb way to write XnY %7C also here's a pipe.
Normal still normal
X&Y should be the same as X&Y | XnY would have been easier.