doudihuang7642 2018-06-14 15:08
浏览 135
已采纳

如何在Go中改善文件编码转换

I've been working with some huge files that I have to convert to UTF-8, as the files ar enormous traditional tools like iconv won't work. So I decided to write my own tool in Go, however I noticed that this encoding conversion is quite slow in Go. here is my code:

package main

import (
    "fmt"
    "io"
    "log"
    "os"

    "golang.org/x/text/encoding/charmap"
)

func main() {
    if len(os.Args) != 3 {
        fmt.Fprintf(os.Stderr, "usage:
\t%s [input] [output]
", os.Args[0])
        os.Exit(1)
    }

    f, err := os.Open(os.Args[1])

    if err != nil {
        log.Fatal(err)
    }

    out, err := os.Create(os.Args[2])

    if err != nil {
        log.Fatal(err)
    }

    r := charmap.ISO8859_1.NewDecoder().Reader(f)

    buf := make([]byte, 1048576)

    io.CopyBuffer(out, r, buf)

    out.Close()
    f.Close()
}

Similar code in Python is much more performant:

import codecs
BLOCKSIZE = 1048576 # or some other, desired size in bytes
with codecs.open("FRWAC-01.xml", "r", "latin_1") as sourceFile:
    with codecs.open("FRWAC-01-utf8.xml", "w", "utf-8") as targetFile:
        while True:
            contents = sourceFile.read(BLOCKSIZE)
            if not contents:
                break
            targetFile.write(contents)

I was sure my Go code would be much quicker because in general I/O in Go is fast, but it turns out is much slower than the Python code. Is there a way to improve the Go program?

  • 写回答

1条回答 默认 最新

  • doupao5296 2018-06-14 18:10
    关注

    The problem here is that you're not comparing the same code in both cases. Also IO speed in Go can't be significantly different that python, since they are making the same syscalls.

    In the python version, the files are buffered by default. In the Go version, while you're using io.CopyBuffer with a 1048576 byte buffer, the decoder is going to make whatever size Read calls it needs directly on the unbuffered file.

    Wrapping the file IO with bufio will produce comparable results.

    inFile, err := os.Open(os.Args[1])
    if err != nil {
        log.Fatal(err)
    }
    defer inFile.Close()
    
    outFile, err := os.Create(os.Args[2])
    if err != nil {
        log.Fatal(err)
    }
    defer outFile.Close()
    
    in := bufio.NewReaderSize(inFile, 1<<20)
    
    out := bufio.NewWriterSize(outFile, 1<<20)
    defer out.Flush()
    
    r := charmap.ISO8859_1.NewDecoder().Reader(in)
    
    if _, err := io.Copy(out, r); err != nil {
        log.Fatal(err)
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮
  • ¥15 ads仿真结果在圆图上是怎么读数的
  • ¥20 Cotex M3的调试和程序执行方式是什么样的?
  • ¥20 java项目连接sqlserver时报ssl相关错误
  • ¥15 一道python难题3
  • ¥15 牛顿斯科特系数表表示
  • ¥15 arduino 步进电机