doudihuang7642 2018-06-14 15:08
浏览 135
已采纳

如何在Go中改善文件编码转换

I've been working with some huge files that I have to convert to UTF-8, as the files ar enormous traditional tools like iconv won't work. So I decided to write my own tool in Go, however I noticed that this encoding conversion is quite slow in Go. here is my code:

package main

import (
    "fmt"
    "io"
    "log"
    "os"

    "golang.org/x/text/encoding/charmap"
)

func main() {
    if len(os.Args) != 3 {
        fmt.Fprintf(os.Stderr, "usage:
\t%s [input] [output]
", os.Args[0])
        os.Exit(1)
    }

    f, err := os.Open(os.Args[1])

    if err != nil {
        log.Fatal(err)
    }

    out, err := os.Create(os.Args[2])

    if err != nil {
        log.Fatal(err)
    }

    r := charmap.ISO8859_1.NewDecoder().Reader(f)

    buf := make([]byte, 1048576)

    io.CopyBuffer(out, r, buf)

    out.Close()
    f.Close()
}

Similar code in Python is much more performant:

import codecs
BLOCKSIZE = 1048576 # or some other, desired size in bytes
with codecs.open("FRWAC-01.xml", "r", "latin_1") as sourceFile:
    with codecs.open("FRWAC-01-utf8.xml", "w", "utf-8") as targetFile:
        while True:
            contents = sourceFile.read(BLOCKSIZE)
            if not contents:
                break
            targetFile.write(contents)

I was sure my Go code would be much quicker because in general I/O in Go is fast, but it turns out is much slower than the Python code. Is there a way to improve the Go program?

  • 写回答

1条回答 默认 最新

  • doupao5296 2018-06-14 18:10
    关注

    The problem here is that you're not comparing the same code in both cases. Also IO speed in Go can't be significantly different that python, since they are making the same syscalls.

    In the python version, the files are buffered by default. In the Go version, while you're using io.CopyBuffer with a 1048576 byte buffer, the decoder is going to make whatever size Read calls it needs directly on the unbuffered file.

    Wrapping the file IO with bufio will produce comparable results.

    inFile, err := os.Open(os.Args[1])
    if err != nil {
        log.Fatal(err)
    }
    defer inFile.Close()
    
    outFile, err := os.Create(os.Args[2])
    if err != nil {
        log.Fatal(err)
    }
    defer outFile.Close()
    
    in := bufio.NewReaderSize(inFile, 1<<20)
    
    out := bufio.NewWriterSize(outFile, 1<<20)
    defer out.Flush()
    
    r := charmap.ISO8859_1.NewDecoder().Reader(in)
    
    if _, err := io.Copy(out, r); err != nil {
        log.Fatal(err)
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?