duanpao9781 2018-10-05 16:06
浏览 274
已采纳

通过UnMarshal和MarshalIndent往返XML

I wanted to quickly create a utility to format any XML data using golang's xml.MarshalIndent()

However this code

package main

import (
    "encoding/xml"
    "fmt"
)

func main() {

    type node struct {
        XMLName  xml.Name
        Attrs    []xml.Attr `xml:",attr"`
        Text     string     `xml:",chardata"`
        Children []node     `xml:",any"`
    }

    x := node{}
    _ = xml.Unmarshal([]byte(doc), &x)
    buf, _ := xml.MarshalIndent(x, "", "  ") // prefix, indent

    fmt.Println(string(buf))
}

const doc string = `<book lang="en">
     <title>The old man and the sea</title>
       <author>Hemingway</author>
</book>`

Produces

<book>&#xA;     &#xA;       &#xA;
  <title>The old man and the sea</title>
  <author>Hemingway</author>
</book>

Notice the extraneous matter after the <book> opening element.

  • I've lost my attributes - why?
  • I'd like to avoid gathering spurious inter-element chardata - How?
  • 写回答

1条回答 默认 最新

  • dtbl1231 2018-10-05 18:22
    关注

    For starters, you aren't using the attribute struct tag correctly, so that's a simple fix for that.

    From https://godoc.org/encoding/xml#Unmarshal

    • If the XML element has an attribute not handled by the previous rule and the struct has a field with an associated tag containing ",any,attr", Unmarshal records the attribute value in the first such field.

    Second, because the tag xml:",chardata" doesn't even pass that field through UnmarshalXML of the xml.Unmarshaller interface, you can't simply create a new type for Text and implement that interface for it as noted in the same docs. (Note that any type other than []byte or string will force an error)

    • If the XML element contains character data, that data is accumulated in the first struct field that has tag ",chardata". The struct field may have type []byte or string. If there is no such field, the character data is discarded.

    Thus, the easiest way to deal with the unwanted characters is after the fact by just replacing them.

    Full code example here: https://play.golang.org/p/VSDskgfcLng

    var Replacer = strings.NewReplacer("&#xA;","","&#x9;","","
    ","","\t","")
    
    func recursiveReplace(n *Node) {
        n.Text = Replacer.Replace(n.Text)
        for i := range n.Children {
            recursiveReplace(&n.Children[i])
        }
    }
    

    One could theoretically implement the xml.Unmarshaller interface for Node, but then you have to not only deal with manual xml parsing, but also the fact that it is a recursive structure. It's easiest to just remove the unwanted characters after the fact.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮