文件读取和校验和。方法之间的差异

Recently I'm into creating checksums for files in go. My code is working with small and big files. I tried two methods, the first uses ioutil.ReadFile("filename") and the second is working with os.Open("filename").

Examples:

The first function is working with the io/ioutil and works for small files. When I try to copy a big file my ram gets blastet and for a 1.5GB iso it uses 3GB of ram.

func byteCopy(fileToCopy string) {
    file, err := ioutil.ReadFile(fileToCopy) //1.5GB file
    omg(err)                                 //error handling function
    ioutil.WriteFile("2.iso", file, 0777)
    os.Remove("2.iso")
}

Even worse when I want to create a checksum with crypto/sha512 and io/ioutil. It will never finish and abort because it runs out of memory.

func ioutilHash() {
    file, _ := ioutil.ReadFile(iso)
    h := sha512.New()
    fmt.Printf("%x", h.Sum(file))
}

When using the function below everything works fine.

func ioHash() {
    f, err := os.Open(iso) //iso is a big ~ 1.5tb file
    omg(err)               //error handling function
    defer f.Close()
    h := sha512.New()
    io.Copy(h, f)
    fmt.Printf("%x", h.Sum(nil))
}

My Question:

Why is the ioutil.ReadFile() function not working right? The 1.5GB file should not fill my 16GB of ram. I don't know where to look right now. Could somebody explain the differences between the methods? I don't get it with reading the go-doc and examples. Having usable code is nice, but understanding why its working is way above that.

Thanks in advance!

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duanju8308 2013-08-25 16:36
关注
The following code doesn't do what you think it does.

func ioutilHash() { file, _ := ioutil.ReadFile(iso) h := sha512.New() fmt.Printf("%x", h.Sum(file)) }

This first reads your 1.5GB iso. As jnml pointed out, it continuously makes bigger and bigger buffers to fill it. In the end, And total buffer size is no less than 1.5GB and no greater than 1.875GB (by the current implementation).

However, after that you then make another buffer! h.Sum(file) doesn't hash file. It appends the current hash to file! This may or may not cause yet another allocation.

The real problem is that you are taking that file, now appended with the hash, and printing it with %x. Fmt actually pre-computes using the same type of method jnml pointed out that ioutil.ReadAll used. So it constantly allocated bigger and bigger buffers to store the hex of your file. Since each letter is 4 bits, that means we are talking about no less than a 3GB buffer for that and no greater than 3.75GB.

This means your active buffers may be as big 5.625GB. Combine that with the GC not being perfect and not removing all the intermediate buffers, and it could very easily fill your space.

The correct way to write that code would have been.

func ioutilHash() { file, _ := ioutil.ReadFile(iso) h := sha512.New() h.Write(file) fmt.Printf("%x", h.Sum(nil)) }

This doesn't do nearly the number the allocations.

The bottom line is that ReadFile is rarely what you want to use. IO streaming (using readers and writers) is always the best way when it is an option. Not only do you allocate much less when you use io.Copy, you also hash and read the disk concurrently. In your ReadFile example, the two resources are used synchronously when they don't depend on each other.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

文件读取和校验和。方法之间的差异
2013-08-25 16:05

回答 2 已采纳 The following code doesn't do what you think it does. func ioutilHash() { file, _ := ioutil.R
字符串写入文件和文件读取操作 c++ c语言
2022-06-03 09:39

回答 1 已采纳一个实现，供参考，下面这个代码应会在编译生成exe的目录生成一个data.txt： #include <stdio.h> void writeStrings(char * s[]){
文本文件和二进制文件的读写 c语言
2022-12-24 15:11

回答 1 已采纳详细代码实现如下，包括注释，望采纳 #include <stdio.h> #include <string.h> #define MAX_NAME_LEN 50 struc
（python）Hex文件解析和校验
2023-01-02 09:37

Marst Code的博客 Intel HEX文件是由一行行符合Intel HEX的文本所构成的ASCII文本文件。在Intel HEX文件中，每一行包含一个...在单片机烧录升级时，hex文件分别记录了flash dirver程序和升级的固件.在进行烧录前,需要先对每条记录进行和.
JUnit测试文件的读取和写入 java 单元测试
2023-03-13 10:07

回答 2 已采纳 JUnit测试文件的读取和创建新文件写入数据，首先要确定读取文件的方法返回的List集合的期待值，可以在JUnit的 @Test 注解的方法中设置期待值，然后调用读取文件的方法，并使用JUnit的as
js静默读取和修改本地json文件 html5 javascript jquery
2020-12-31 17:38

回答 3 已采纳如果记录是有用的话，纯前端是做不到的
CSstdiofile对文件频繁读写的优化方法
2019-04-08 15:48

回答 1 已采纳这里是c++的语法，创建file的引用，而不是指针。
利用python进行文件的一致性和差异性检查
2021-03-14 09:51

三七分术士的博客应用场景: 代码和配置文件差异对比。 import difflib def diff(fn1, fn2): """对比两个文件内容的不同并以html的格式返回""" with open(fn1) as f1: content1 = f1.readlines() with open(fn2) as f2: c
C++中使用二进制文件保存和读取结构体的问题 c++
2018-03-22 11:41

回答 3 已采纳这里的问题是string，string类型浅拷贝的结果是两个指针指向同一块区域，那么析构的时候就会遇到double free了，你如果想整体将结构体写入文件，那么你需要确保没有复杂的结构对象，或者你自
打开和读取文件的单元测试功能
2019-05-31 14:46

回答 3 已采纳 You need to refactor your code and make more suitable for testing. Here is how I would do it: fu
网络编程和文件读写的程序
2016-11-07 02:30

回答 4 已采纳一般爬行可以根据运行的环境是单机或者是分布式设定工作的线程数，同时需要一个爬行过页面的索引，因为是多线程，爬过的就需要跳过。还有就是索引可以将爬下来的内容进行多个文件存储，因为单个文件存储不但查询
JAVA校验文件类型
2024-02-06 23:59

银龙丶裁决的博客通常校验文件类型，是获取文件后缀，根据后缀名进行判断。但其实这种方式是有被欺骗风险的。这里记录几种判断文件类型的方式。
springboot读取自定义的yml文件，读取内容和文件中不一致 spring 有问必答
2021-04-06 15:07

回答 6 已采纳先看一下target里面对应的user.yml文件里面的值是否正常
raid10和raid5 文件服务器,Raid10需要几块硬盘？与Raid5的区别
2021-08-12 19:21

weixin_39643338的博客 RAID10标准也被称作RAID 1+0 ，也就是将RAID 1和RAID0标准结合，在连续地以位或字节为单位分割数据并且并行读/写多个磁盘的同时，为每一块磁盘作磁盘镜像进行冗余！首先，RAID 10对存储容量的利用率和RAID1一样低，...
Checksum 校验和
2017-11-23 08:56

莫言静好、的博客算法不同: CRC采用多项式除法，MD5和SHA1使用的是替换、轮转等方法；校验值的长度不同: CRC校验位的长度跟其多项式有关系，一般为16位或32位； MD5是16个字节（128位）； SHA1是20个字节（160位）；安全性不同：...
没有解决我的问题, 去提问

悬赏问题

¥20 ML307A在使用AT命令连接EMQX平台的MQTT时被拒绝
¥20 腾讯企业邮箱邮件可以恢复么
¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗？
¥15 错误 LNK2001 无法解析的外部符号
¥50 安装pyaudiokits失败
¥15 计组这些题应该咋做呀
¥60 更换迈创SOL6M4AE卡的时候，驱动要重新装才能使用，怎么解决？
¥15 让node服务器有自动加载文件的功能
¥15 jmeter脚本回放有的是对的有的是错的
¥15 r语言蛋白组学相关问题

文件读取和校验和。 方法之间的差异

2条回答 默认 最新

悬赏问题

文件读取和校验和。方法之间的差异

2条回答默认最新