如何在XML字符串中获取“ <”和“>”？

Is it posible to get '<' and '>' value in this XML string? I have problem with unmarshal, and I can't change the strings. Is there anyone who can help me in this? Here my code:

package main

import (
    "encoding/xml"
    "fmt"
)

func main() {
    type Example struct {
        XMLName xml.Name `xml:"Shop"`
        ShopName  string `xml:"ShopName"`
    }

    myString1 := `<Shop> 
        <ShopName>Fresh Fruit <Fruit Shop></ShopName>
    </Shop>`

    myString2 :=`<Shop> 
        <ShopName>Fresh Fruit < Fruit Shop ></ShopName>
    </Shop>`

    //example 1
    var example1 Example
    err := xml.Unmarshal([]byte(myString1), &example1)
    if err != nil {
        fmt.Println("error: %example1", err)
    }else{
        fmt.Println(example1.ShopName)
    }       

    //example 2
    var example2 Example
    err = xml.Unmarshal([]byte(myString2), &example2)
    if err != nil {
        fmt.Printf("error: %example2", err)
        return
    }else{
        fmt.Println(example2.ShopName)
    }
}

I get an error bellow:

error: %example1 XML syntax error on line 2: attribute name without = in element
error: &{%!e(string=expected element name after <) %!e(int=2)}xample2

What I want to get:

Fresh Fruit <Fruit Shop>
Fresh Fruit < Fruit Shop >

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

doudui1850 2017-08-12 21:38

关注

The input you have is definitely invalid XML. There is a bug in the creation routine of the XML.

Idea

Since you say you have to deal with it the way it is... here a suggestion:

replace all closing tags via regex to something you will basically never have in your input (e.g. @#lt#@/tagname@#gt#@). While doing that save all the distinct tag names to a slice.
With the slice of tag names replace the start tags
Now escape all remaining < and >
Last but not least replace the original tags back: @#lt#@ to < and @#gt#@ to >

Now you should have valid xml that is parseable.

Proof of Concept

Playground

package main

import (
    "bytes"
    "fmt"
    "log"
    "regexp"
    "sort"
)

var (
    rlt = []byte("@#lt#@")
    rgt = []byte("@#gt#@")
    lt  = []byte("&lt;")
    gt  = []byte("&gt;")
)

// used for sorting strings by length
type ByLength []string

func (s ByLength) Len() int {
    return len(s)
}
func (s ByLength) Swap(i, j int) {
    s[i], s[j] = s[j], s[i]
}
func (s ByLength) Less(i, j int) bool {
    return len(s[i]) < len(s[j])
}

func main() {
    s := `<Shop>
    <ShopName>Fresh Fruit <Fruit Shop></ShopName>
    <ShopName attr="val1">Fresh Fruit <Shop test></ShopName>
</Shop>`

    r1, err := regexp.Compile("</([^<>]*)>")
    if err != nil {
        log.Fatal(err)
    }

    names := []string{}
    out := r1.ReplaceAllFunc([]byte(s), func(b []byte) []byte {
        name := b[2 : len(b)-1]

        // TODO: only append name if not already in list
        names = append(names, string(name))

        // probably optimizable
        bytes := make([]byte, 0, len(name)+12)
        bytes = append(bytes, rlt...)
        bytes = append(bytes, name...)
        bytes = append(bytes, rgt...)
        return bytes
    })

    // sort names descending by length otherwise we risk replacing parts of names like with <Shop and <ShopName
    sort.Sort(sort.Reverse(ByLength(names)))

    for _, name := range names {
        // replace only exact start tags
        out = bytes.Replace(out, []byte(fmt.Sprintf("<%s>", name)), []byte(fmt.Sprintf("@#lt#@%s@#gt#@", name)), -1)

        // replace start tags with attributes
        r3, err := regexp.Compile(fmt.Sprintf("<%s( [^<>=]+=\"[^<>]+)>", name))
        if err != nil {
            // handle error
        }
        out = r3.ReplaceAll(out, []byte(fmt.Sprintf("@#lt#@%s$1@#gt#@", name)))
    }

    out = bytes.Replace(out, []byte{'<'}, lt, -1)
    out = bytes.Replace(out, []byte{'>'}, gt, -1)

    out = bytes.Replace(out, rlt, []byte{'<'}, -1)
    out = bytes.Replace(out, rgt, []byte{'>'}, -1)

    fmt.Println(string(out))
}

Notes

this is a proof of concept. This is not optimised for performance.
you might still run into content that might not be escaped properly. Then you will need to further optimise. If there is something like this in the content it will be falsely considered a tag: <tagname> or <tagname something ="something>. Therefore expect some xml to still to be invalid. Log invalid xml so you can improve the algorithm.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

报告相同问题？

关注问题

如何在XML字符串中获取“ <”和“>”？ xml
2017-08-12 16:50

回答 1 已采纳 The input you have is definitely invalid XML. There is a bug in the creation routine of the XML.
如何从XML文件中的每个<description>获取第一个<p>？ php xml
2015-12-07 23:46

回答 1 已采纳 You can get the children of an element using the children() method. If you can guarantee that the
xml文件里面怎么进行字符串拼接？ java 开发语言有问必答
2021-06-30 15:46

回答 3 已采纳在xml中使用sql字符串拼接和你的数据库有关，如果你是orcale: 关键符号 || select * from student where name like '%' || #{name} ||
XML基础解析<copy>
2016-03-22 08:24

莫特@的博客引言 ...不过个人一直认为基本的技术和思想是放之四海而皆准的，许多技术未必需要我们从头到尾再研究一遍，我们要做的就是站在巨人的肩膀上，利用其成果来为人们的需求服务。随着移动互联网时代
mybatis 中Mapper.xml文件中<sql>标签报红 java 有问必答
2022-03-03 13:54

回答 3 已采纳编译运行提示的错误，还是鼠标放上去提示的错误。 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE mapper PUBLIC "-//
mapper.xml中传入的参数类型是List<String>, sql语句怎么取 java mysql 数据库
2022-05-07 08:54

回答 5 已采纳 and Datetime >= #{datetime[0]} and Datetime <= #{datetime[1]}
PHP：从XML字符串中获取数据 php xml
2018-05-20 03:22

回答 2 已采纳 The var_dump gives you an object of type SimpleXMLElement which has a __toString method which retu
C++QT开发——Xml、Json解析
2022-11-14 20:07

程序员老舅的博客 C++QT开发——Xml、Json解析
mapper.xml中 <foreach>遍历List问题 java
2019-07-10 11:17

回答 3 已采纳请参考下这个 ![图片说明](https://img-ask.csdn.net/upload/201907/10/1562729068_52868.jpg)
从许多<td>标签的html字符串中获取值[重复] php
2017-07-24 13:09

回答 1 已采纳 Use php DomDocument $dom = new DOMDocument(); $dom->loadHTMLFile("test.html"); $tables = $d
java解析xml字符串怎么全部都解析 eclipse java java-ee maven
2020-01-13 10:31

回答 2 已采纳前几天写了一个用XStream解析xml的，你可以看一下[XStream解析XML](https://blog.csdn.net/VICTOR_fusheng/article/details/1038
字符串 - 二进制和文本字符串 - 探究
2023-03-20 14:02

宁小法的博客主要用于探究字符串中的二进制和文本字符串，以及它们的区别和应用场景。
使用XStream在将XML字符串转为JavaBean中的问题 java
2020-07-14 15:13

回答 1 已采纳 https://blog.csdn.net/iteye_20824/article/details/82329806
XML解析详解
2017-03-10 17:51

李超的博客的博客 XML(eXtensive Markup Language)可扩展的标记语言，是万维网联盟(World Wide Web Consortium W3C)定义的一种标准。可扩展性指允许用户按照XML规则自定义标记(tags 标签)。作用：作为微型数据库，存储数据；作为...
logback的使用和logback.xml详解，在Spring项目中使用log打印日志
2017-12-14 23:29

趣学程序-shaofeer的博客 logback的使用和logback.xml详解一、logback的介绍　Logback是由log4j创始人设计的另一个开源日志组件,官方网站： http://logback.qos.ch。它当前分为下面下个模块：　logback-core：其它两个模块的基础...
没有解决我的问题, 去提问

悬赏问题

¥20 数学建模，尽量用matlab回答，论文格式
¥15 昨天挂载了一下u盘，然后拔了
¥30 win from 窗口最大最小化，控件放大缩小，闪烁问题
¥20 易康econgnition精度验证
¥15 msix packaging tool打包问题
¥28 微信小程序开发页面布局没问题，真机调试的时候页面布局就乱了
¥15 python的qt5界面
¥15 无线电能传输系统MATLAB仿真问题
¥50 如何用脚本实现输入法的热键设置
¥20 我想使用一些网络协议或者部分协议也行，主要想实现类似于traceroute的一定步长内的路由拓扑功能

码龄粉丝数原力等级 --

如何在XML字符串中获取“ <”和“>”？

1条回答默认最新

码龄粉丝数原力等级 --

Idea

Proof of Concept

Notes

悬赏问题

如何在XML字符串中获取“ <”和“>”？

1条回答 默认 最新

Idea

Proof of Concept

Notes

悬赏问题

1条回答默认最新