dtdt0454 2017-09-26 10:08
浏览 56
已采纳

XML Unmarshal被并行标记绊倒了

I'm working on an RSS reader application and running into an issue with The New York Times RSS feed. I've narrowed the issue down to the following XML (unnecessary fields omitted):

<item>
  <link>https://www.nytimes.com/2017/09/25/briefing/nfl-angela-merkel-iraqi-kurdistan.html?partner=rss&amp;emc=rss</link>
  <atom:link rel="standout" href="https://www.nytimes.com/2017/09/25/briefing/nfl-angela-merkel-iraqi-kurdistan.html?partner=rss&amp;emc=rss"/>
  <pubDate>Mon, 25 Sep 2017 13:36:07 GMT</pubDate>
</item>

I'm trying to parse it into the following structure:

type item struct {
    Link    string `xml:"link"`
    PubDate string `xml:"pubDate"`
}

When parsed, the Link field is blank. However, by deleting the atom:link field it works fine. I think the similarity in the tags names is confusing the parser. I have a go playground that demonstrates the issue, and that removing that line fixes it: https://play.golang.org/p/fUbLhSbo5K How can I work around this issue? It's not really feasible to special case it because there could be other feeds that do this too.

  • 写回答

1条回答 默认 最新

  • drn34916 2017-09-26 10:21
    关注

    This is a long-standing documentation bug in Go's encoding/xml package. Basically, when you don't specify a namespace, the field will match any namespace instead of no namespace. In fact, there is no way to make a field match only when there is no namespace. If your XML has a namespace, the solution is to explicitly set it:

    <item xmlns="foo">
      <link>https://www.nytimes.com/2017/09/25/briefing/nfl-angela-merkel-iraqi-kurdistan.html?partner=rss&amp;emc=rss</link>
      <atom:link rel="standout" href="https://www.nytimes.com/2017/09/25/briefing/nfl-angela-merkel-iraqi-kurdistan.html?partner=rss&amp;emc=rss"/>
      <pubDate>Mon, 25 Sep 2017 13:36:07 GMT</pubDate>
    </item>
    
    type item struct {
        Link    string `xml:"foo link"`
        PubDate string `xml:"pubDate"`
    }
    

    Playground: https://play.golang.org/p/L9WOhixTKa.

    If your link element explicitly doesn't have a namespace, you'll probably have to roll out your own UnmarshalXML method.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 有赏,i卡绘世画不出
  • ¥15 如何用stata画出文献中常见的安慰剂检验图
  • ¥15 c语言链表结构体数据插入
  • ¥40 使用MATLAB解答线性代数问题
  • ¥15 COCOS的问题COCOS的问题
  • ¥15 FPGA-SRIO初始化失败
  • ¥15 MapReduce实现倒排索引失败
  • ¥15 ZABBIX6.0L连接数据库报错,如何解决?(操作系统-centos)
  • ¥15 找一位技术过硬的游戏pj程序员
  • ¥15 matlab生成电测深三层曲线模型代码