I have a 100Gb sized xml file and parse it with SAX method in go with this code
file, err := os.Open(filename)
handle(err)
defer file.Close()
buffer := bufio.NewReaderSize(file, 1024*1024*256) // 33554432
decoder := xml.NewDecoder(buffer)
for {
t, _ := decoder.Token()
if t == nil {
break
}
switch se := t.(type) {
case xml.StartElement:
if se.Name.Local == "House" {
house := House{}
err := decoder.DecodeElement(&house, &se)
handle(err)
}
}
}
But golang working very slow, its seems by execution time and disk usage. My hdd capable to read data with speed around 100-120 mb/s, but golang uses only 10-13 mb/s. For experiment i rewrite this code in c#:
using (XmlReader reader = XmlReader.Create(filename)
{
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
if (reader.Name == "House")
{
//Code
}
break;
}
}
}
And i got full hdd loaded, c# read data with 100-110mb/s speed. And execution time around 10 times lower.
How can i improve xml parse performance using golang?