I have a large number of xml files to parse that contain unclosed tags wrapped in closed tags. Something like below:
<submission>
<first-name>Henry
<last-name>Donald
<id>4224
</submission>
I set decoder.Strict = false but it is still unable to parse the entire xml file properly.
type Submission struct {
FirstName string `xml:"first-name"`
LastName string `xml:"last-name"`
ID string `xml:"id"`
}
func main() {
dec := xml.NewDecoder(bytes.NewReader([]byte(sub)))
dec.Strict = false
dec.AutoClose = xml.HTMLAutoClose
dec.Entity = xml.HTMLEntity
var s Submission
err := dec.Decode(&s)
if err != nil {
fmt.Println(err)
}
fmt.Println(s)
}
Playground: https://play.golang.org/p/-_chEpDhzX
I know there is a html tokenizer that I may try using but I would prefer to use the XML package as the majority of the files are properly formatted.