I've been working with some huge files that I have to convert to UTF-8, as the files ar enormous traditional tools like iconv won't work. So I decided to write my own tool in Go, however I noticed that this encoding conversion is quite slow in Go. here is my code:
package main
import (
"fmt"
"io"
"log"
"os"
"golang.org/x/text/encoding/charmap"
)
func main() {
if len(os.Args) != 3 {
fmt.Fprintf(os.Stderr, "usage:
\t%s [input] [output]
", os.Args[0])
os.Exit(1)
}
f, err := os.Open(os.Args[1])
if err != nil {
log.Fatal(err)
}
out, err := os.Create(os.Args[2])
if err != nil {
log.Fatal(err)
}
r := charmap.ISO8859_1.NewDecoder().Reader(f)
buf := make([]byte, 1048576)
io.CopyBuffer(out, r, buf)
out.Close()
f.Close()
}
Similar code in Python is much more performant:
import codecs
BLOCKSIZE = 1048576 # or some other, desired size in bytes
with codecs.open("FRWAC-01.xml", "r", "latin_1") as sourceFile:
with codecs.open("FRWAC-01-utf8.xml", "w", "utf-8") as targetFile:
while True:
contents = sourceFile.read(BLOCKSIZE)
if not contents:
break
targetFile.write(contents)
I was sure my Go code would be much quicker because in general I/O in Go is fast, but it turns out is much slower than the Python code. Is there a way to improve the Go program?