doujiu8479 2015-11-15 10:54
浏览 91

去解析无效的XML

There is a link to XML: http://www.guru.com/rss/jobs/ When try to parse a XML with encoding/xml, get error:

XML syntax error on line 1: invalid XML name: t

I know, this XML is broken, but how I can ignore this, and parse the first items?

Last Item of XML looks like this:

<item>
    <title>Online Ad Posting Data Entry Jobs</t
    <?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
        <channel>
            <title>Guru Jobs</title>
            <link>http://www.guru.com</link>
            <description>Guru Jobs</description>
            <lastBuildDate>Sun, 15 Nov 2015 11:04:51 GMT</lastBuildDate>
            <language>en-us</language>
            <atom:link href='http://www.guru.com/rss/jobs/' rel="self" type="application/rss+xml" />
        </channel>
    </rss>
    itle>
    <link>http://www.guru.com/jobs/online-ad-posting-data-entry-jobs/1189496</link>
    <guid>http://www.guru.com/jobs/online-ad-posting-data-entry-jobs/1189496</guid>
</item> 

Code example:

type Rss2 struct { 
    ItemList []Item `xml:"channel>item"`
}
type Item struct {
    Title       string      `xml:"title"`
    Link        string      `xml:"link"`
    Description string      `xml:"description"`
    PubDate     string      `xml:"pubDate"`
    GUID        string      `xml:"guid"`    
}

r := Rss2{}
reader := bytes.NewReader(xmlRead)
decoder := xml.NewDecoder(reader)
decoder.CharsetReader = charset.NewReaderLabel
decoder.Strict = false
err = decoder.Decode(&r)
if err != nil { fmt.Printf(err.Error()) }
  • 写回答

2条回答 默认 最新

  • douju8782 2015-11-16 04:03
    关注

    XML tags should be properly opened and closed. From the XML that you have posted, it seems like XML Declaration is not in the beginning.

    <?xml version="1.0" encoding="utf-8"?>
    

    This should be at the beginning. Hope this helps

    评论

报告相同问题?

悬赏问题

  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度