doukaojie8573 2018-09-27 03:41
浏览 86
已采纳

如何从xml(包括标签)中提取完整的html?

I have the following code:

package main

import (
    "encoding/xml"
    "fmt"
)

func main() {
    xr := &xmlResponse{}

    if err := xml.Unmarshal([]byte(x), &xr); err != nil {
        panic(err)
    }

    fmt.Printf("%+v", xr)
}

type xmlResponse struct {
    //Title string `xml:"title,omitempty"`
    Title struct {
        BoldWords []struct {
            Bold string `xml:",chardata"`
        } `xml:"bold,omitempty"`
        Title string `xml:",chardata" `
    } `xml:"title,omitempty"`
}

var x = `<?xml version="1.0" encoding="utf-8"?>
<mytag version="1.0">
  <title><bold>Go</bold> is a programming language. I repeat: <bold>Go</bold> is a programming language.</title>
</mytag>`

This outputs:

&{Title:{BoldWords:[{Bold:Go} {Bold:Go}] Title: is a programming language. I repeat:  is a programming language.}}

How do I get:

<bold>Go</bold> is a programming language. I repeat: <bold>Go</bold> is a programming language.

In other words, I need not only the tags but also keep them in the proper place and not just a slice of bolded items. Trying to get it just as a string (e.g. uncommenting the first "Title" in xmlResponse struct) leaves out the bolded items entirely.

  • 写回答

1条回答 默认 最新

  • dswsl2016 2018-09-27 03:48
    关注

    From the Docs

    If the XML element contains character data, that data is
    accumulated in the first struct field that has tag ",chardata". The struct field may have type []byte or string. If there is no such field, the character data is discarded.

    This is actually not what you want, what you're looking for is:

    If the struct has a field of type []byte or string with tag
    ",innerxml", Unmarshal accumulates the raw XML nested inside the
    element in that field. The rest of the rules still apply.

    So, use innerxml instead of chardata.

    package main
    
    import (
        "encoding/xml"
        "fmt"
    )
    
    func main() {
        xr := &xmlResponse{}
    
        if err := xml.Unmarshal([]byte(x), &xr); err != nil {
            panic(err)
        }
    
        fmt.Printf("%+v", xr)
    }
    
    type xmlResponse struct {
        //Title string `xml:"title,omitempty"`
        Title struct {
            Title string `xml:",innerxml" `
        } `xml:"title,omitempty"`
    }
    
    var x = `<?xml version="1.0" encoding="utf-8"?>
    <mytag version="1.0">
      <title><bold>Go</bold> is a programming language. I repeat: <bold>Go</bold> is a programming language.</title>
    </mytag>`
    

    Outputs:

    &{Title:{Title:<bold>Go</bold> is a programming language. I repeat: <bold>Go</bold> is a programming language.}}
    

    Play

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图