dozr13344 2010-09-15 00:09
浏览 46
已采纳

基本的DOM XML解析器需要什么?

I've started programming in Google's Go Language, and the package I'm attempting to write is an API for processing and creating DOCX files (I'm familiar with this topic and thought it would be a good way to learn Go). As DOCX files are primarly a ZIP file with various XML files inside them, I rather need a DOM XML parser. However, I was unable to find any native Go DOM XML Parsers, as the only ones I saw seemed to be very limited, and probably SAX parsers (anyone who uses Go, correct me if I'm wrong).

So this past weekend I wrote a very basic DOM XML parser that was able to parse one of the simpler XML files within the DOCX package and output it back intact. At the moment I'm not going to bother with Namespace, XSLT, or schema validation support, as those aren't useful for manipulating DOCX files. My question is, what other XML standards and functionality would be important to incorporate into the parser?

At the moment, it only really just creates a tree of elements and attributes, which I can modify and save. I'm not current handling CDATA elements or XML escape characters (though those would be easy to do and I'll get to that this weekend).

  • 写回答

2条回答 默认 最新

  • doujingjiao0015 2010-09-15 00:29
    关注

    First of all: if you specifically want to do DOM parser, you need to implement DOM API. But I am not sure if you actually mean that; perhaps you just mean an XML parser that produces XML tree model ("dom"); or just an XML parser? DOM is hardly the only way. Also note that implementing DOM tree model using SAX parser is the most common way; few if any DOM packages have embedded parsers, commonly parser is exposed separately.

    As to XML parser features, some of things that are MUSTs in my opinion are:

    • Handling of character entities (ampersand and number), pre-defined general entities (lt, gt, apos, quot)
    • Handling of xml declaration ()
    • Handling of various input encodings; declared by xml declaration or externally -- too many parsers skimp on this, but is very imporant since xml documents can reliably detect encoding internally.
    • Checking for uniqueness of attribute values
    • Checking for proper nesting of elements
    • Skipping of comments
    • Skippping (if not handling) of processing instructions
    • CDATA handling -- it's simple to do
    • Keeping track of line numbers for error reporting

    Other eventually useful things are:

    • Namespace handling
    • Checking of character validity, both content and names
    • Normalization of lineefeds as per xml specification
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 stata安慰剂检验作图但是真实值不出现在图上
  • ¥15 c程序不知道为什么得不到结果
  • ¥40 复杂的限制性的商函数处理
  • ¥15 程序不包含适用于入口点的静态Main方法
  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题