如何在XML字符串中获取“ <”和“>”？

Is it posible to get '<' and '>' value in this XML string? I have problem with unmarshal, and I can't change the strings. Is there anyone who can help me in this? Here my code:

package main

import (
    "encoding/xml"
    "fmt"
)

func main() {
    type Example struct {
        XMLName xml.Name `xml:"Shop"`
        ShopName  string `xml:"ShopName"`
    }

    myString1 := `<Shop> 
        <ShopName>Fresh Fruit <Fruit Shop></ShopName>
    </Shop>`

    myString2 :=`<Shop> 
        <ShopName>Fresh Fruit < Fruit Shop ></ShopName>
    </Shop>`

    //example 1
    var example1 Example
    err := xml.Unmarshal([]byte(myString1), &example1)
    if err != nil {
        fmt.Println("error: %example1", err)
    }else{
        fmt.Println(example1.ShopName)
    }       

    //example 2
    var example2 Example
    err = xml.Unmarshal([]byte(myString2), &example2)
    if err != nil {
        fmt.Printf("error: %example2", err)
        return
    }else{
        fmt.Println(example2.ShopName)
    }
}

I get an error bellow:

error: %example1 XML syntax error on line 2: attribute name without = in element
error: &{%!e(string=expected element name after <) %!e(int=2)}xample2

What I want to get:

Fresh Fruit <Fruit Shop>
Fresh Fruit < Fruit Shop >

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

doudui1850 2017-08-12 21:38

关注

The input you have is definitely invalid XML. There is a bug in the creation routine of the XML.

Idea

Since you say you have to deal with it the way it is... here a suggestion:

replace all closing tags via regex to something you will basically never have in your input (e.g. @#lt#@/tagname@#gt#@). While doing that save all the distinct tag names to a slice.
With the slice of tag names replace the start tags
Now escape all remaining < and >
Last but not least replace the original tags back: @#lt#@ to < and @#gt#@ to >

Now you should have valid xml that is parseable.

Proof of Concept

Playground

package main

import (
    "bytes"
    "fmt"
    "log"
    "regexp"
    "sort"
)

var (
    rlt = []byte("@#lt#@")
    rgt = []byte("@#gt#@")
    lt  = []byte("&lt;")
    gt  = []byte("&gt;")
)

// used for sorting strings by length
type ByLength []string

func (s ByLength) Len() int {
    return len(s)
}
func (s ByLength) Swap(i, j int) {
    s[i], s[j] = s[j], s[i]
}
func (s ByLength) Less(i, j int) bool {
    return len(s[i]) < len(s[j])
}

func main() {
    s := `<Shop>
    <ShopName>Fresh Fruit <Fruit Shop></ShopName>
    <ShopName attr="val1">Fresh Fruit <Shop test></ShopName>
</Shop>`

    r1, err := regexp.Compile("</([^<>]*)>")
    if err != nil {
        log.Fatal(err)
    }

    names := []string{}
    out := r1.ReplaceAllFunc([]byte(s), func(b []byte) []byte {
        name := b[2 : len(b)-1]

        // TODO: only append name if not already in list
        names = append(names, string(name))

        // probably optimizable
        bytes := make([]byte, 0, len(name)+12)
        bytes = append(bytes, rlt...)
        bytes = append(bytes, name...)
        bytes = append(bytes, rgt...)
        return bytes
    })

    // sort names descending by length otherwise we risk replacing parts of names like with <Shop and <ShopName
    sort.Sort(sort.Reverse(ByLength(names)))

    for _, name := range names {
        // replace only exact start tags
        out = bytes.Replace(out, []byte(fmt.Sprintf("<%s>", name)), []byte(fmt.Sprintf("@#lt#@%s@#gt#@", name)), -1)

        // replace start tags with attributes
        r3, err := regexp.Compile(fmt.Sprintf("<%s( [^<>=]+=\"[^<>]+)>", name))
        if err != nil {
            // handle error
        }
        out = r3.ReplaceAll(out, []byte(fmt.Sprintf("@#lt#@%s$1@#gt#@", name)))
    }

    out = bytes.Replace(out, []byte{'<'}, lt, -1)
    out = bytes.Replace(out, []byte{'>'}, gt, -1)

    out = bytes.Replace(out, rlt, []byte{'<'}, -1)
    out = bytes.Replace(out, rgt, []byte{'>'}, -1)

    fmt.Println(string(out))
}

Notes

this is a proof of concept. This is not optimised for performance.
you might still run into content that might not be escaped properly. Then you will need to further optimise. If there is something like this in the content it will be falsely considered a tag: <tagname> or <tagname something ="something>. Therefore expect some xml to still to be invalid. Log invalid xml so you can improve the algorithm.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

报告相同问题？

关注问题

php输出xml格式字符串(用的这个)
2020-12-19 14:29

在实际应用中，XML字符串通常会动态生成，基于数据库或其他数据源。例如，你可以从MySQL查询中获取数据，并使用这些数据填充XML结构。你可以使用PHP的字符串函数来构造XML元素，或者利用DOMDocument和...
【移动应用开发】基于Android的资源文件字符串配置与读取：实现动态文本显示功能
2025-04-10 22:05

内容概要：本文介绍了如何在Android开发中利用资源文件strings.xml配置字符串，并在布局文件与Java代码中引用这些配置。首先，strings.xml文件中定义了如应用名称、天气状况等字符串资源，其优势在于无需编译即可...
XML基础解析<copy>
2016-03-22 08:24

莫特@的博客引言 ...不过个人一直认为基本的技术和思想是放之四海而皆准的，许多技术未必需要我们从头到尾再研究一遍，我们要做的就是站在巨人的肩膀上，利用其成果来为人们的需求服务。随着移动互联网时代
一款超实用的快速从xml中提取字符串到strings的插件
2019-10-09 15:01

小贝-bin的博客给的第一个任务便是做硬编码优化，要做的事情大概是，把xml中写死的dp、sp、color、文字等，全部映射到dimen.xml和strings.xml中，初步分析，对于dp、sp、color这些并不难，利用正则很快就能替换完，可是字符串又该...
【Springboot】批量图片上传从HttpServletRequest到List＜MultipartFile＞
2022-05-12 23:07

锥栗的博客 getRequestURI()：获取请求的 URI（不包括查询字符串）。 getQueryString()：获取查询字符串部分。 getMethod()：获取请求的方法类型，如 GET、POST、PUT、DELETE 等。 getHeader(String name)：获取指定的请求头值...
Android中为字符串添加修饰的练习题要求说明.pdf
2022-07-10 04:13

在Android应用开发中，对字符串进行修饰是常见的需求，它能增强用户界面的美观性和交互性。本练习题旨在帮助开发者掌握如何在Android环境中为字符串资源添加不同的修饰，以实现特定的功能。以下是对练习题的具体解析...
C++QT开发——Xml、Json解析
2022-11-14 20:07

程序员老舅的博客 C++QT开发——Xml、Json解析
字符串 - 二进制和文本字符串 - 探究
2023-03-20 10:05

宁小法先森︿(￣︶￣)︿的博客二进制与文本字符串是计算机中两种基本数据类型，具有不同的特性和应用场景。二进制数据以0和1直接存储信息，适用于图像、音频、视频等非文本数据的存储和传输，具有高效紧凑的特点，但不易人类阅读。文本字符串则...
Android移动应用开发中基本资源的定义与使用单元主要内容.pdf
2022-07-10 07:52

这样，你可以在代码中通过引用资源ID来获取和显示这些字符串，例如：`getString(R.string.字符串名称)`。总结来说，Android开发者需要熟悉如何定义和使用这些基本资源，以便有效地构建和维护应用程序。理解资源的...
logback的使用和logback.xml详解，在Spring项目中使用log打印日志
2017-12-14 23:29

趣学程序-shaofeer的博客 logback的使用和logback.xml详解一、logback的介绍　Logback是由log4j创始人设计的另一个开源日志组件,官方网站： http://logback.qos.ch。它当前分为下面下个模块：　logback-core：其它两个模块的基础...
没有解决我的问题, 去提问

码龄粉丝数原力等级 --

如何在XML字符串中获取“ <”和“>”？

1条回答默认最新

码龄粉丝数原力等级 --

Idea

Proof of Concept

Notes

如何在XML字符串中获取“ <”和“>”？

1条回答 默认 最新

Idea

Proof of Concept

Notes

1条回答默认最新