douxian9060 2016-02-17 02:57
浏览 32
已采纳

如何使用Go读取不良XML

I'd like to use Go to read an XML file. The problem is that it's a bad XML file -- it doesn't conform to the spec. Here's a sample:

<?xml version="1.0" encoding="UTF-8"?>
<something abc="1" def="2">
    <0 x="a"/>
    <1 x="b"/>
    <2 x="c"/>
    <26 x="z"/>
</something>

My Go program correctly gives an error when trying to read this:

$ go run rs.go <real.xml
chardata: '
'
start: name.local='something'
start {{ something} [{{ abc} 1} {{ def} 2}]}
'abc'='1'
'def'='2'
offset=66
chardata: '
    '
XML syntax error on line 3: invalid XML name: 0
exit status 1

Here's the little Go program:

package main

import (
    "encoding/xml"
    "fmt"
    "io"
    "os"
)

//  <something abc="1" def="2">
type Something struct {
    abc   string `xml:"abc"`
    def   string `xml:"def"`
    spots []Spot
}

//    <0 x="a"/>
type Spot struct {
    num  int    // ??
    xval string `xml:"x"`
}

func main() {
    dec := xml.NewDecoder(os.Stdin)
    //  dec.Strict = false      // doesn't help  <0 ...> problem
    //  dec.Entity = xml.HTMLEntity

    for {
        tok, err := dec.Token()
        if err == io.EOF {
            break
        } else if err != nil {
            fmt.Fprintf(os.Stderr, "%v
", err)
            os.Exit(1)
        }

        switch tok := tok.(type) {
        case xml.StartElement:
            fmt.Printf("start: name.local='%s'
", tok.Name.Local)
            fmt.Printf("start %v
", tok)
            for _, a := range tok.Attr {
                fmt.Printf("'%s'='%s'
", a.Name.Local, a.Value)
            }
            fmt.Printf("offset=%d
", dec.InputOffset())
        case xml.EndElement:
            fmt.Printf("end: name.local='%s'
", tok.Name.Local)
        case xml.CharData:
            fmt.Printf("chardata: '%s'
", tok)
        case xml.Comment:
            fmt.Printf("comment: '%s'
", tok)
        }
    }
}

Is there a Go expert out there who can help me figure out how to get Go to read this goofy XML file? Thanks!

  • 写回答

2条回答 默认 最新

  • douliu7929 2016-02-25 02:04
    关注

    Posting my comment as an answer.

    It doesn't seem like you would be able to use the Go xml package directly here. But you could:

    • consider forking the xml package and changing the isName function to allow your format, or
    • sanitize the XML first, changing it into valid XML, and then use the Go xml package to do the parsing.
    • Yet another option (probably a good one, depending on how wild your "XML" input is), is to implement your own parser, as explained on the Gopher Academy blog: advent-2014/parsers-lexers
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀
  • ¥20 手写数字识别运行c仿真时,程序报错错误代码sim211-100
  • ¥15 关于#hadoop#的问题
  • ¥15 (标签-Python|关键词-socket)
  • ¥15 keil里为什么main.c定义的函数在it.c调用不了
  • ¥50 切换TabTip键盘的输入法
  • ¥15 可否在不同线程中调用封装数据库操作的类
  • ¥15 微带串馈天线阵列每个阵元宽度计算