开始：一次仅解码一个XML节点

Looking through the sourcecode for encoding/xml package, all of the unmarshaling logic (which decodes the actual XML nodes and types them) is in unmarshal and the only way to invoke this is essentially by calling DecodeElement. However, the unmarshaling logic also inherently searches-out the next EndElement. The predominant reason for this seems to be validation. However, this seems to represent a major design flaw to me: What if I have a massive XML file, I am sufficiently confident in its structure, and I'd just like to decode a single node at a time so that I can efficiently filter through the data on-the-fly? The RawToken() call can be used to get the current tag, which is great, but, obviously, when you call DecodeElement() on it, there's an error when the inevitable unmarshal() call apparently starts running into nodes in a way that it perceives as unbalanced.

It seems theoretically possible to encounter a token that I'd like to decode, capture the offset, decode the element, seek back to the previous position, and loop, but that'd still result in a massive amount of unnecessary processing.

Is there no way to just parse one node at a time?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

duanaozhong0696 2016-01-23 12:42

关注

What you describe is called XML stream parsing as it is done by any SAX parser, for example. Good news: encoding/xml supports that, albeit it is a bit hidden.

What you actually have to do is to create an instance of xml.Decoder, passing an io.Reader. Then you will use Decoder.Token() to read the input stream until the next valid xml token found. From there, you can decide what to do next.

Here is a little example also available as gist, or you can <kbd>Run it on PlayGround</kbd>:

package main

import (
    "bytes"
    "encoding/xml"
    "fmt"
)

const (
    book = `<?xml version="1.0" encoding="UTF-8"?>
<book>
  <preface>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</preface>
  <chapter num="1" title="Foo">Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</chapter>
  <chapter num="2" title="Bar">Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</chapter>
</book>`
)

type Chapter struct {
    Num     int    `xml:"num,attr"`
    Title   string `xml:"title,attr"`
    Content string `xml:",chardata"`
}

func main() {

    // We emulate a file or network stream
    b := bytes.NewBufferString(book)

    // And set up a decoder
    d := xml.NewDecoder(b)

    for {

        // We look for the next token
        // Note that this only reads until the next positively identified
        // XML token in the stream
        t, err := d.Token()

        if err != nil  {
            break
        }

        switch et := t.(type) {

        case xml.StartElement:
            // We now have to inspect wether we are interested in the element
            // otherwise we will advance
            if et.Name.Local == "chapter" {
                // Most often/likely element first

                c := &Chapter{}

                // We decode the element into(automagically advancing the stream)
                // If no matching token is found, there will be an error
                // Note the search only happens within the parent.
                if err := d.DecodeElement(&c, &et); err != nil {
                    panic(err)
                }

                // We have found what we are interested in, so we print it
                fmt.Printf("%d: %s
", c.Num, c.Title)

            } else if et.Name.Local == "book" {
                fmt.Println("Book begins!")
            }

        case xml.EndElement:

            if et.Name.Local != "book" {
                continue
            }

            fmt.Println("Finished processing book!")
        }
    }
}

报告相同问题？

关注问题

data:image/svg+xml原理求解，猜想是一种加密 python
2022-08-12 20:25

回答 2 已采纳这只是图片头部申明，图除了二进数据，还可以程序生成canvas，这样也没有链接，不过另存为功能不影响的，除非程序不让另存，只好截图。
有关一个暗号解码的问题 c语言
2022-07-02 10:19

回答 3 已采纳代码仅供参考，谢谢！ #include<stdio.h> #define N 10240 int putskeywords(const char *str, int num) {
如何使用goroutines解码XML xml
2019-02-09 12:27

回答 1 已采纳 You have only one xml.Decoder object. Every time something calls xmlDocDecoder.Token(), it will r
C++QT开发——Xml、Json解析
2022-11-14 20:07

程序员老舅的博客 C++QT开发——Xml、Json解析
视频编解码驱动程序开发和Java后端开发哪个方向好？职场和发展驱动开发
2022-08-13 01:14

回答 4 已采纳视频编解码驱动开发，入门门槛比较高，要求C/C++、数学理论、计算机操作系统底层等功底好。在就业机会上相对窄，但也人才稀缺。如果想学习，可以从FFMPEG的学习入手。JAVA后端开发，就业面相对广，从
JSON解码数组仅显示第一项而没有值 php
2018-03-23 11:22

回答 2 已采纳 json_decode takes a JSON encoded string and converts it into a PHP variable. but with assoc set
h264第一帧为P帧能立即解码吗 c++ 图像处理
2022-10-28 16:04

回答 2 已采纳不能的，必须有I帧的
python【模块】xml.etree.ElementTree 解析 xml
2022-08-09 15:08

ghostwritten的博客 XML 创建了一种易于解释并支持层次结构的树状结构。只要页面遵循 XML，就可以将其称为 XML 文档。XML 文档具有称为元素的部分，由开始和结束标记定义。标签是一种以开头。开始标签和结束标签之间的字符（如果有的话...
高朗xml解码 xml
2017-04-19 00:50

回答 1 已采纳 The root element is automatically decoded into the value you pass to Decode so you don't need to m
如何通过Golang解码具有特殊字符的xml xml
2017-11-30 02:45

回答 2 已采纳 This XML is not well-formed. It contain syntax error because character & has a special meaning. If
JSON解码得到最后一个元素，PHP结束 php
2017-10-01 13:39

回答 2 已采纳 When you decode your $namejson json string to array, you can use array_keys to get array of keys,
物联网系统开发：快速实现一个智能充电柜（智能锁）
2022-01-14 16:41

王鹏鹏鹏的博客快速开发：快速实现一个智能充电柜（智能锁）架构图前端 WebView 常用的JS调用Java代码的方法，主要包括以下三种： 1）通过WebView的addJavascriptInterface进行对象映射 2）通过 WebViewClient 的...
开始：Base64解码有什么问题
2015-11-14 01:43

回答 1 已采纳 Your input does not have any padding. Therefore, you should use base64.RawStdEncoding over base64.
【Android 音视频开发打怪升级：音视频硬解码篇】三、音视频播放：音视频同步
2024-04-18 17:41

2401_84123171的博客这里我希望可以帮助到大家提升进阶。Android学习PDF+架构视频+面试文档+源码笔记高级架构技术进阶脑图、...目录截图出来，每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频，并且会持续更新！**
前端面试的话术集锦第 1 篇：基础篇一
2023-08-25 16:38

互联网全栈开发实战的博客前端面试话术集锦第一篇：前端需要注意哪些SEO；的title和alt有什么区别；HTTP的⼏种请求⽅法⽤途；从浏览器地址栏输⼊url到显示⻚⾯的步骤等；如何进⾏⽹站性能优化；HTTP状态码及其含义；语义化的理解；介绍⼀...
没有解决我的问题, 去提问

悬赏问题

¥15 Python中的request，如何使用ssr节点，通过代理requests网页。本人在泰国，需要用大陆ip才能玩网页游戏，合法合规。
¥100 为什么这个恒流源电路不能恒流？
¥15 有偿求跨组件数据流路径图
¥15 写一个方法checkPerson，入参实体类Person，出参布尔值
¥15 我想咨询一下路面纹理三维点云数据处理的一些问题，上传的坐标文件里是怎么对无序点进行编号的，以及xy坐标在处理的时候是进行整体模型分片处理的吗
¥15 CSAPPattacklab
¥15 一直显示正在等待HID—ISP
¥15 Python turtle 画图
¥15 stm32开发clion时遇到的编译问题
¥15 lna设计源简并电感型共源放大器

码龄粉丝数原力等级 --

开始：一次仅解码一个XML节点

1条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

开始：一次仅解码一个XML节点

1条回答 默认 最新

悬赏问题

1条回答默认最新