douqihou7537 2019-02-09 12:27
浏览 43
已采纳

如何使用goroutines解码XML

I'm working on a Proof Of Concept to investigate the time required to parse an XML document with a certain amount of entities.

First of all, I do have my struct that contains the entries in my XML document:

type Node struct {
    ID             int    `xml:"id,attr"`
    Position       int    `xml:"position,attr"`
    Depth          int    `xml:"depth,attr"`
    Parent         string `xml:"parent,attr"`
    Name           string `xml:"Name"`
    Description    string `xml:"Description"`
    OwnInformation struct {
        Title       string `xml:"Title"`
        Description string `xml:"Description"`
    } `xml:"OwnInformation"`
    Assets []struct {
        ID           string `xml:"id,attr"`
        Position     int    `xml:"position,attr"`
        Type         string `xml:"type,attr"`
        Category     int    `xml:"category,attr"`
        OriginalFile string `xml:"OriginalFile"`
        Description  string `xml:"Description"`
        URI          string `xml:"Uri"`
    } `xml:"Assets>Asset"`
    Synonyms []string `xml:"Synonyms>Synonym"`
}

Next, I do have a factory that can generate any given amount of elements:

func CreateNodeXMLDocumentBytes(
    nodeElementCount int) []byte {

    xmlContents := new(bytes.Buffer)

    xmlContents.WriteString("<ROOT>
")

    for iterationCounter := 0; iterationCounter < nodeElementCount; iterationCounter++ {
        appendNodeXMLElement(iterationCounter, xmlContents)
    }

    xmlContents.WriteString("</ROOT>")

    return xmlContents.Bytes()
}

// PRIVATE: appendNodeXMLElement appends a '<Node />' elements to an existing bytes.Buffer instance.
func appendNodeXMLElement(
    counter int,
    xmlDocument *bytes.Buffer) {

    xmlDocument.WriteString("<Node id=\"" + strconv.Itoa(counter) + "\" position=\"0\" depth=\"0\" parent=\"0\">
")
    xmlDocument.WriteString("    <Name>Name</Name>
")
    xmlDocument.WriteString("    <Description>Description</Description>
")
    xmlDocument.WriteString("    <OwnInformation>
")
    xmlDocument.WriteString("        <Title>Title</Title>
")
    xmlDocument.WriteString("        <Description>Description</Description>
")
    xmlDocument.WriteString("    </OwnInformation>
")
    xmlDocument.WriteString("    <Assets>
")
    xmlDocument.WriteString("        <Asset id=\"0\" position=\"0\" type=\"0\" category=\"0\">
")
    xmlDocument.WriteString("            <OriginalFile>OriginalFile</OriginalFile>
")
    xmlDocument.WriteString("            <Description>Description</Description>
")
    xmlDocument.WriteString("            <Uri>Uri</Uri>
")
    xmlDocument.WriteString("        </Asset>
")
    xmlDocument.WriteString("        <Asset id=\"1\" position=\"1\" type=\"1\" category=\"1\">
")
    xmlDocument.WriteString("            <OriginalFile>OriginalFile</OriginalFile>
")
    xmlDocument.WriteString("            <Description>Description</Description>
")
    xmlDocument.WriteString("            <Uri>Uri</Uri>
")
    xmlDocument.WriteString("        </Asset>
")
    xmlDocument.WriteString("        <Asset id=\"2\" position=\"2\" type=\"2\" category=\"2\">
")
    xmlDocument.WriteString("            <OriginalFile>OriginalFile</OriginalFile>
")
    xmlDocument.WriteString("            <Description>Description</Description>
")
    xmlDocument.WriteString("            <Uri>Uri</Uri>
")
    xmlDocument.WriteString("        </Asset>
")
    xmlDocument.WriteString("        <Asset id=\"3\" position=\"3\" type=\"3\" category=\"3\">
")
    xmlDocument.WriteString("            <OriginalFile>OriginalFile</OriginalFile>
")
    xmlDocument.WriteString("            <Description>Description</Description>
")
    xmlDocument.WriteString("            <Uri>Uri</Uri>
")
    xmlDocument.WriteString("        </Asset>
")
    xmlDocument.WriteString("        <Asset id=\"4\" position=\"4\" type=\"4\" category=\"4\">
")
    xmlDocument.WriteString("            <OriginalFile>OriginalFile</OriginalFile>
")
    xmlDocument.WriteString("            <Description>Description</Description>
")
    xmlDocument.WriteString("            <Uri>Uri</Uri>
")
    xmlDocument.WriteString("        </Asset>
")
    xmlDocument.WriteString("    </Assets>
")
    xmlDocument.WriteString("    <Synonyms>
")
    xmlDocument.WriteString("        <Synonym>Synonym 0</Synonym>
")
    xmlDocument.WriteString("        <Synonym>Synonym 1</Synonym>
")
    xmlDocument.WriteString("        <Synonym>Synonym 2</Synonym>
")
    xmlDocument.WriteString("        <Synonym>Synonym 3</Synonym>
")
    xmlDocument.WriteString("        <Synonym>Synonym 4</Synonym>
")
    xmlDocument.WriteString("    </Synonyms>
")
    xmlDocument.WriteString("</Node>
")
}

Next, I have the application that creates a sample document and decodes each '' element:

func main() {
    nodeXMLDocumentBytes := factories.CreateNodeXMLDocumentBytes(100)

    xmlDocReader := bytes.NewReader(nodeXMLDocumentBytes)
    xmlDocDecoder := xml.NewDecoder(xmlDocReader)

    xmlDocNodeElementCounter := 0

    start := time.Now()

    for {
        token, _ := xmlDocDecoder.Token()
        if token == nil {
            break
        }

        switch element := token.(type) {
        case xml.StartElement:
            if element.Name.Local == "Node" {
                xmlDocNodeElementCounter++

                xmlDocDecoder.DecodeElement(new(entities.Node), &element)
            }
        }
    }

    fmt.Println("Total '<Node />' elements in the XML document: ", xmlDocNodeElementCounter)
    fmt.Printf("Total elapsed time: %v
", time.Since(start))
}

This takes around 11ms on my machine.

Next, I used goroutines to decode the XML elements:

func main() {
    nodeXMLDocumentBytes := factories.CreateNodeXMLDocumentBytes(100)

    xmlDocReader := bytes.NewReader(nodeXMLDocumentBytes)
    xmlDocDecoder := xml.NewDecoder(xmlDocReader)

    xmlDocNodeElementCounter := 0

    start := time.Now()

    for {
        token, _ := xmlDocDecoder.Token()
        if token == nil {
            break
        }

        switch element := token.(type) {
        case xml.StartElement:
            if element.Name.Local == "Node" {
                xmlDocNodeElementCounter++

                go xmlDocDecoder.DecodeElement(new(entities.Node), &element)
            }
        }
    }

    time.Sleep(time.Second * 5)

    fmt.Println("Total '<Node />' elements in the XML document: ", xmlDocNodeElementCounter)
    fmt.Printf("Total elapsed time: %v
", time.Since(start))
}

I use a simple 'Sleep' command to ensure that the goroutines are finished. I know it should be implemented with channels and a worker queue.

According to the output on my console only 3 elements are decoded. So what happened to the other elements? Perhaps something to do with the fact that I'm using streams?

Is there any way on how I can make it concurrent so that the required time to decode all the elements is lowered?

  • 写回答

1条回答 默认 最新

  • dpb56083 2019-02-09 13:54
    关注

    You have only one xml.Decoder object. Every time something calls xmlDocDecoder.Token(), it will read the next token out of the (single) input stream. In your example both the main loop and every goroutine you launch are trying to read the same input stream at the same time, so the token stream gets split across all of the goroutines kind of randomly. Probably if you run this again you'll get different results; and I'm a little surprised this works without panicking in some strange way.

    A couple of things about XML make this hard to parallelize. The sequence you actually need to achieve here is:

    1. Notice a <Node> start-element event.
    2. Read forward until the matching </Node> end-element event, at the same depth, remembering every event you passed in the meantime.
    3. Launch a goroutine to unmarshal all of the events you remembered into a structure.

    In practice it's likely the "remember every event" step is as expensive as just doing the unmarshal, and that this entire sequence will be much faster than the disk or network I/O to read the file in the first place. This doesn't seem like something that will parallelize well.

    This takes around 11ms on my machine.

    You're not really doing enough work to get a good feel for if it's "fast" or "slow". Look at the benchmarking support in the testing package for a better approach, plus the built-in profiling tools. That will tell you where the time is actually going and suggest what you could improve.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 (希望可以解决问题)ma和mb文件无法正常打开,打开后是空白,但是有正常内存占用,但可以在打开Maya应用程序后打开场景ma和mb格式。
  • ¥20 ML307A在使用AT命令连接EMQX平台的MQTT时被拒绝
  • ¥20 腾讯企业邮箱邮件可以恢复么
  • ¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗?
  • ¥15 错误 LNK2001 无法解析的外部符号
  • ¥50 安装pyaudiokits失败
  • ¥15 计组这些题应该咋做呀
  • ¥60 更换迈创SOL6M4AE卡的时候,驱动要重新装才能使用,怎么解决?
  • ¥15 让node服务器有自动加载文件的功能
  • ¥15 jmeter脚本回放有的是对的有的是错的