duandun3178 2015-04-07 18:14
浏览 71

在Go中解析格式错误的xml文件

I have a large number of xml files to parse that contain unclosed tags wrapped in closed tags. Something like below:

<submission>
<first-name>Henry
<last-name>Donald
<id>4224
</submission>

I set decoder.Strict = false but it is still unable to parse the entire xml file properly.

type Submission struct {
    FirstName string `xml:"first-name"`
    LastName  string `xml:"last-name"`
    ID        string `xml:"id"`
}

func main() {
    dec := xml.NewDecoder(bytes.NewReader([]byte(sub)))
    dec.Strict = false
    dec.AutoClose = xml.HTMLAutoClose
    dec.Entity = xml.HTMLEntity

    var s Submission
    err := dec.Decode(&s)
    if err != nil {
        fmt.Println(err)
    }

    fmt.Println(s)
}

Playground: https://play.golang.org/p/-_chEpDhzX

I know there is a html tokenizer that I may try using but I would prefer to use the XML package as the majority of the files are properly formatted.

  • 写回答

2条回答 默认 最新

  • dqy0707 2015-04-07 18:40
    关注

    No ways around it. You need your own decoder: http://play.golang.org/p/Kr7nq64f-c

    评论

报告相同问题?

悬赏问题

  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 关于大棚监测的pcb板设计
  • ¥15 stm32开发clion时遇到的编译问题
  • ¥15 lna设计 源简并电感型共源放大器
  • ¥15 如何用Labview在myRIO上做LCD显示?(语言-开发语言)
  • ¥15 Vue3地图和异步函数使用