duandi8852752 2013-08-25 16:05
浏览 80
已采纳

文件读取和校验和。 方法之间的差异

Recently I'm into creating checksums for files in go. My code is working with small and big files. I tried two methods, the first uses ioutil.ReadFile("filename") and the second is working with os.Open("filename").

Examples:

The first function is working with the io/ioutil and works for small files. When I try to copy a big file my ram gets blastet and for a 1.5GB iso it uses 3GB of ram.

func byteCopy(fileToCopy string) {
    file, err := ioutil.ReadFile(fileToCopy) //1.5GB file
    omg(err)                                 //error handling function
    ioutil.WriteFile("2.iso", file, 0777)
    os.Remove("2.iso")
}

Even worse when I want to create a checksum with crypto/sha512 and io/ioutil. It will never finish and abort because it runs out of memory.

func ioutilHash() {
    file, _ := ioutil.ReadFile(iso)
    h := sha512.New()
    fmt.Printf("%x", h.Sum(file))
}

When using the function below everything works fine.

func ioHash() {
    f, err := os.Open(iso) //iso is a big ~ 1.5tb file
    omg(err)               //error handling function
    defer f.Close()
    h := sha512.New()
    io.Copy(h, f)
    fmt.Printf("%x", h.Sum(nil))
}

My Question:

Why is the ioutil.ReadFile() function not working right? The 1.5GB file should not fill my 16GB of ram. I don't know where to look right now. Could somebody explain the differences between the methods? I don't get it with reading the go-doc and examples. Having usable code is nice, but understanding why its working is way above that.

Thanks in advance!

  • 写回答

2条回答 默认 最新

  • duanju8308 2013-08-25 16:36
    关注

    The following code doesn't do what you think it does.

    func ioutilHash() {
        file, _ := ioutil.ReadFile(iso)
        h := sha512.New()
        fmt.Printf("%x", h.Sum(file))
    }
    

    This first reads your 1.5GB iso. As jnml pointed out, it continuously makes bigger and bigger buffers to fill it. In the end, And total buffer size is no less than 1.5GB and no greater than 1.875GB (by the current implementation).

    However, after that you then make another buffer! h.Sum(file) doesn't hash file. It appends the current hash to file! This may or may not cause yet another allocation.

    The real problem is that you are taking that file, now appended with the hash, and printing it with %x. Fmt actually pre-computes using the same type of method jnml pointed out that ioutil.ReadAll used. So it constantly allocated bigger and bigger buffers to store the hex of your file. Since each letter is 4 bits, that means we are talking about no less than a 3GB buffer for that and no greater than 3.75GB.

    This means your active buffers may be as big 5.625GB. Combine that with the GC not being perfect and not removing all the intermediate buffers, and it could very easily fill your space.


    The correct way to write that code would have been.

    func ioutilHash() {
        file, _ := ioutil.ReadFile(iso)
        h := sha512.New()
        h.Write(file)
        fmt.Printf("%x", h.Sum(nil))
    }
    

    This doesn't do nearly the number the allocations.


    The bottom line is that ReadFile is rarely what you want to use. IO streaming (using readers and writers) is always the best way when it is an option. Not only do you allocate much less when you use io.Copy, you also hash and read the disk concurrently. In your ReadFile example, the two resources are used synchronously when they don't depend on each other.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 ML307A在使用AT命令连接EMQX平台的MQTT时被拒绝
  • ¥20 腾讯企业邮箱邮件可以恢复么
  • ¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗?
  • ¥15 错误 LNK2001 无法解析的外部符号
  • ¥50 安装pyaudiokits失败
  • ¥15 计组这些题应该咋做呀
  • ¥60 更换迈创SOL6M4AE卡的时候,驱动要重新装才能使用,怎么解决?
  • ¥15 让node服务器有自动加载文件的功能
  • ¥15 jmeter脚本回放有的是对的有的是错的
  • ¥15 r语言蛋白组学相关问题