XML解析返回带换行符的字符串

I am trying to parse XML via the sitemap, and then loop over the address to get the details of the post in Go. But I am getting this weird error:

: first path segment in URL cannot contain colon

This is the code snippet:

type SitemapIndex struct {
    Locations []Location `xml:"sitemap"`
}

type Location struct {
    Loc string `xml:"loc"`
}

func (l Location) String() string {
    return fmt.Sprintf(l.Loc)
}

func main() {
    resp, _ := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")
    bytes, _ := ioutil.ReadAll(resp.Body)
    var s SitemapIndex
    xml.Unmarshal(bytes, &s)
    for _, Location := range s.Locations {
        fmt.Printf("Location: %s", Location.Loc)
        resp, err := http.Get(Location.Loc)
        fmt.Println("resp", resp)
        fmt.Println("err", err)
    }
}

And the output:

Location: 
https://www.washingtonpost.com/news-sitemaps/politics.xml
resp <nil>
err parse 
https://www.washingtonpost.com/news-sitemaps/politics.xml
: first path segment in URL cannot contain colon
Location: 
https://www.washingtonpost.com/news-sitemaps/opinions.xml
resp <nil>
err parse 
https://www.washingtonpost.com/news-sitemaps/opinions.xml
: first path segment in URL cannot contain colon
...
...

My guess is that the Location.Loc returns a new line before and after the actuall address. Eg: Location: https://www.washingtonpost.com/news-sitemaps/politics.xml

Because hardcoding the URL works as expected:

for _, Location := range s.Locations {
        fmt.Printf("Location: %s", Location.Loc)
        test := "https://www.washingtonpost.com/news-sitemaps/politics.xml"
        resp, err := http.Get(test)
        fmt.Println("resp", resp)
        fmt.Println("err", err)
    }

Output, as you can see the error is nil:

Location: 
https://www.washingtonpost.com/news-sitemaps/politics.xml
resp &{200 OK 200 HTTP/2.0 2 0 map[Server:[nginx] Arc-Service:[api] Arc-Org-Name:[washpost] Expires:[Sat, 02 Feb 2019 05:32:38 GMT] Content-Security-Policy:[upgrade-insecure-requests] Arc-Deployment:[washpost] Arc-Organization:[washpost] Cache-Control:[private, max-age=60] Arc-Context:[index] Arc-Application:[Feeds] Vary:[Accept-Encoding] Content-Type:[text/xml; charset=utf-8] Arc-Servername:[api.washpost.arcpublishing.com] Arc-Environment:[index] Arc-Org-Env:[washpost] Arc-Route:[/feeds] Date:[Sat, 02 Feb 2019 05:31:38 GMT]] 0xc000112870 -1 [] false true map[] 0xc00017c200 0xc0000ca370}
err <nil>
Location: 
...
...

But I am very new to Go, and so I have no idea what's wrong. Could you please tell me where I am wrong?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
drra6593 2019-02-02 07:11
关注
You are right indeed, the issue comes from the newlines. As you can see, you are using Printf without adding any and one is added at the beginning and one at the end in the output.

You can use strings.Trim to remove those newlines. Here is an example working with the sitemap that you are trying to parse. Once the string is trimmed, you will be able to call http.Get on it without any errors.

func main() { var s SitemapIndex xml.Unmarshal(bytes, &s) for _, Location := range s.Locations { loc := strings.Trim(Location.Loc, " ") fmt.Printf("Location: %s ", loc) } }

This code properly outputs the locations without any newlines, as expected:

Location: https://www.washingtonpost.com/news-sitemaps/politics.xml Location: https://www.washingtonpost.com/news-sitemaps/opinions.xml Location: https://www.washingtonpost.com/news-sitemaps/local.xml Location: https://www.washingtonpost.com/news-sitemaps/sports.xml Location: https://www.washingtonpost.com/news-sitemaps/national.xml Location: https://www.washingtonpost.com/news-sitemaps/world.xml Location: https://www.washingtonpost.com/news-sitemaps/business.xml Location: https://www.washingtonpost.com/news-sitemaps/technology.xml Location: https://www.washingtonpost.com/news-sitemaps/lifestyle.xml Location: https://www.washingtonpost.com/news-sitemaps/entertainment.xml Location: https://www.washingtonpost.com/news-sitemaps/goingoutguide.xml

The reason why you have those newlines in the Location.Loc field is because of the XML returned by this URL. Entries are following this form:

<sitemap> <loc> https://www.washingtonpost.com/news-sitemaps/goingoutguide.xml </loc> </sitemap>

And as you can see, there are newlines before and after the content within the loc elements.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

java解析xml字符串怎么全部都解析 eclipse java java-ee maven
2020-01-13 10:31

回答 2 已采纳前几天写了一个用XStream解析xml的，你可以看一下[XStream解析XML](https://blog.csdn.net/VICTOR_fusheng/article/details/1038
xml文件里面怎么进行字符串拼接？ java 开发语言有问必答
2021-06-30 15:46

回答 3 已采纳在xml中使用sql字符串拼接和你的数据库有关，如果你是orcale: 关键符号 || select * from student where name like '%' || #{name} ||
dataset转xml字符串 c# json xml
2022-03-30 16:42

回答 1 已采纳 JSON.parse是用来转json字符串的吧
C++QT开发——Xml、Json解析
2022-11-14 20:07

程序员老舅的博客 C++QT开发——Xml、Json解析
我想要达到的结果js如何解析下面的xml字符串 java 有问必答
2022-11-11 16:42

回答 2 已采纳示例代码如下 <script> var s = `<?xml version="1.0" encoding="gb2312"?> <Orders>
request请求接口返回xml解析错误 python
2023-03-29 19:57

回答 3 已采纳该回答引用ChatGPT根据问题描述，可以初步判断是请求头或请求参数的问题导致的。因为使用postman请求接口响应正常，说明接口本身没有问题，而使用request请求接口出现xml解析错误，可能是因
如何解析不规范的XML字符串 xml
2013-07-20 10:59

回答 2 已采纳可以考虑把xml认为是一个html文件，html的解析工具不太care是否有html根，只要节点前后都标记了，他就认为是一个节点了，利用JSOUP等工具为它加上root根之后就能解析为xml了，或者直
Android 使用Pull方法解析XML文件的方法
2020-09-05 08:09

- `nextTag()`: 通常用于跳过空格和换行符，并前进到下一个`START_TAG`或`END_TAG`。这在处理嵌套标签时非常有用，可以避免处理文本事件中的空白字符。 - `nextText()`: 只能在`START_TAG`事件后调用，如果下一个...
带有换行符字符样本的Android SAX Parser？ android php xml
2011-04-10 17:39

回答 1 已采纳 I had the same problem as you. I found this solution to a similar problem and adapted it to this o
PHP：从XML字符串中获取数据 php xml
2018-05-20 03:22

回答 2 已采纳 The var_dump gives you an object of type SimpleXMLElement which has a __toString method which retu
java 解析xml文件遇到这种结构怎么解析 java xml
2019-08-21 10:34

回答 1 已采纳参考下：https://www.iteye.com/blog/hellsing42-115248
字符串 - 二进制和文本字符串 - 探究
2023-03-20 14:02

宁小法的博客主要用于探究字符串中的二进制和文本字符串，以及它们的区别和应用场景。
使用Jdom解析XML
2016-10-23 17:27

faraway2004的博客关于使用jdom解析xml
XML解析详解
2017-03-10 17:51

李超的博客的博客 XML(eXtensive Markup Language)可扩展的标记语言，是万维网联盟(World Wide Web Consortium W3C)定义的一种标准。可扩展性指允许用户按照XML规则自定义标记(tags 标签)。作用：作为微型数据库，存储数据；作为...
【STL专题】深入探索C++之std::string：不止于字符串【万字详解】
2024-05-24 11:00

CILMY23的博客 1.string是表示字符串的字符串类2. 该类的接口与常规容器的接口基本相同，再添加了一些专门用来操作string的常规操作。3.string在底层实际是：basic_string模板类的别名，typedef basic_string string;4. 不能操作多...
没有解决我的问题, 去提问

悬赏问题

¥15 nginx反向代理获取ip，java获取真实ip
¥15 eda：门禁系统设计
¥50 如何使用js去调用vscode-js-debugger的方法去调试网页
¥15 376.1电表主站通信协议下发指令全被否认问题
¥15 物体双站RCS和其组成阵列后的双站RCS关系验证
¥15 复杂网络，变滞后传递熵，FDA
¥20 csv格式数据集预处理及模型选择
¥15 部分网页页面无法显示！
¥15 怎样解决power bi 中设置管理聚合，详细信息表和详细信息列显示灰色，而不能选择相应的内容呢？
¥15 QTOF MSE数据分析

XML解析返回带换行符的字符串

2条回答 默认 最新

悬赏问题

2条回答默认最新