dos49618 2014-11-05 11:41
浏览 107
已采纳

使用Go解析巨大的XML文件

We need to parse a huge XML file using Go. We'd like to use a SAX-like event based algorithm using xml.NewDecoder() and decoder.Token() library calls. We've created the appropriate struct types with XML annotations. Everything easy peasy so far.

Now, we go through the file and detect the xml.StartElement tokens. And here comes the problem. We need to decode ONLY the attributes of this starting token and continue into its content. If we call token.DecodeElement() the whole content is "decoded" or skipped in our scenario.

How to decode only the attributes of a specific StartElement and continue to the element's body?

  • 写回答

1条回答 默认 最新

  • duanrebo3559 2014-11-06 21:24
    关注

    I parse wikipedia xml dumps (~50GB xml files) in go-wikiparse using plain struct/reflect decoding. It's super simple.

    The strategy is basically this:

    First, read the envelope token:

    d := xml.NewDecoder(r)
    _, err := d.Token()
    if err != nil {
        return nil, err
    }
    

    e.g., for <someDocument><billions-of-other-things/></someDocument> that will give you someDocument.

    Then, you can just struct decode the next things in a loop:

    var i item
    d.Decode(&i)
    

    Not much RAM, and it's super easy to parse.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效
  • ¥15 悬赏!微信开发者工具报错,求帮改
  • ¥20 wireshark抓不到vlan
  • ¥20 关于#stm32#的问题:需要指导自动酸碱滴定仪的原理图程序代码及仿真
  • ¥20 设计一款异域新娘的视频相亲软件需要哪些技术支持
  • ¥15 stata安慰剂检验作图但是真实值不出现在图上
  • ¥15 c程序不知道为什么得不到结果
  • ¥40 复杂的限制性的商函数处理
  • ¥15 程序不包含适用于入口点的静态Main方法