duanpao9781 2018-10-05 16:06
浏览 274
已采纳

通过UnMarshal和MarshalIndent往返XML

I wanted to quickly create a utility to format any XML data using golang's xml.MarshalIndent()

However this code

package main

import (
    "encoding/xml"
    "fmt"
)

func main() {

    type node struct {
        XMLName  xml.Name
        Attrs    []xml.Attr `xml:",attr"`
        Text     string     `xml:",chardata"`
        Children []node     `xml:",any"`
    }

    x := node{}
    _ = xml.Unmarshal([]byte(doc), &x)
    buf, _ := xml.MarshalIndent(x, "", "  ") // prefix, indent

    fmt.Println(string(buf))
}

const doc string = `<book lang="en">
     <title>The old man and the sea</title>
       <author>Hemingway</author>
</book>`

Produces

<book>&#xA;     &#xA;       &#xA;
  <title>The old man and the sea</title>
  <author>Hemingway</author>
</book>

Notice the extraneous matter after the <book> opening element.

  • I've lost my attributes - why?
  • I'd like to avoid gathering spurious inter-element chardata - How?
  • 写回答

1条回答 默认 最新

  • dtbl1231 2018-10-05 18:22
    关注

    For starters, you aren't using the attribute struct tag correctly, so that's a simple fix for that.

    From https://godoc.org/encoding/xml#Unmarshal

    • If the XML element has an attribute not handled by the previous rule and the struct has a field with an associated tag containing ",any,attr", Unmarshal records the attribute value in the first such field.

    Second, because the tag xml:",chardata" doesn't even pass that field through UnmarshalXML of the xml.Unmarshaller interface, you can't simply create a new type for Text and implement that interface for it as noted in the same docs. (Note that any type other than []byte or string will force an error)

    • If the XML element contains character data, that data is accumulated in the first struct field that has tag ",chardata". The struct field may have type []byte or string. If there is no such field, the character data is discarded.

    Thus, the easiest way to deal with the unwanted characters is after the fact by just replacing them.

    Full code example here: https://play.golang.org/p/VSDskgfcLng

    var Replacer = strings.NewReplacer("&#xA;","","&#x9;","","
    ","","\t","")
    
    func recursiveReplace(n *Node) {
        n.Text = Replacer.Replace(n.Text)
        for i := range n.Children {
            recursiveReplace(&n.Children[i])
        }
    }
    

    One could theoretically implement the xml.Unmarshaller interface for Node, but then you have to not only deal with manual xml parsing, but also the fact that it is a recursive structure. It's easiest to just remove the unwanted characters after the fact.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料