dtdt0454 2017-09-26 10:08
浏览 56
已采纳

XML Unmarshal被并行标记绊倒了

I'm working on an RSS reader application and running into an issue with The New York Times RSS feed. I've narrowed the issue down to the following XML (unnecessary fields omitted):

<item>
  <link>https://www.nytimes.com/2017/09/25/briefing/nfl-angela-merkel-iraqi-kurdistan.html?partner=rss&amp;emc=rss</link>
  <atom:link rel="standout" href="https://www.nytimes.com/2017/09/25/briefing/nfl-angela-merkel-iraqi-kurdistan.html?partner=rss&amp;emc=rss"/>
  <pubDate>Mon, 25 Sep 2017 13:36:07 GMT</pubDate>
</item>

I'm trying to parse it into the following structure:

type item struct {
    Link    string `xml:"link"`
    PubDate string `xml:"pubDate"`
}

When parsed, the Link field is blank. However, by deleting the atom:link field it works fine. I think the similarity in the tags names is confusing the parser. I have a go playground that demonstrates the issue, and that removing that line fixes it: https://play.golang.org/p/fUbLhSbo5K How can I work around this issue? It's not really feasible to special case it because there could be other feeds that do this too.

  • 写回答

1条回答 默认 最新

  • drn34916 2017-09-26 10:21
    关注

    This is a long-standing documentation bug in Go's encoding/xml package. Basically, when you don't specify a namespace, the field will match any namespace instead of no namespace. In fact, there is no way to make a field match only when there is no namespace. If your XML has a namespace, the solution is to explicitly set it:

    <item xmlns="foo">
      <link>https://www.nytimes.com/2017/09/25/briefing/nfl-angela-merkel-iraqi-kurdistan.html?partner=rss&amp;emc=rss</link>
      <atom:link rel="standout" href="https://www.nytimes.com/2017/09/25/briefing/nfl-angela-merkel-iraqi-kurdistan.html?partner=rss&amp;emc=rss"/>
      <pubDate>Mon, 25 Sep 2017 13:36:07 GMT</pubDate>
    </item>
    
    type item struct {
        Link    string `xml:"foo link"`
        PubDate string `xml:"pubDate"`
    }
    

    Playground: https://play.golang.org/p/L9WOhixTKa.

    If your link element explicitly doesn't have a namespace, you'll probably have to roll out your own UnmarshalXML method.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度