为什么通过TeeReader的tar.gz的tar部分的md5哈希错误？

I was just experimenting with archive/tar and compress/gzip, for automated processing of some backups I have.

My problem hereby is: I have various .tar files and .tar.gz files floating around, and thus I want to extract the hash (md5) of the .tar.gz file, and the hash (md5) of the .tar file as well, ideally in one run.

The example code I have so far, works perfectly fine for the hashes of the files in the .tar.gz as well for the .gz, but the hash for the .tar is wrong and I can't find out what the problem is.

I looked at the tar/reader.go file and I saw that there is some skipping in there, yet I thought everything should run over the io.Reader interface and thus the TeeReader should still catch all the bytes.

package main

import (
    "archive/tar"
    "compress/gzip"
    "crypto/md5"
    "fmt"
    "io"
    "os"
)

func main() {
    tgz, _ := os.Open("tb.tar.gz")
    gzMd5 := md5.New()
    gz, _ := gzip.NewReader(io.TeeReader(tgz, gzMd5))
    tarMd5 := md5.New()
    tr := tar.NewReader(io.TeeReader(gz, tarMd5))
    for {
        fileMd5 := md5.New()
        hdr, err := tr.Next()
        if err == io.EOF {
            break
        }
        io.Copy(fileMd5, tr)
        fmt.Printf("%x  %s
", fileMd5.Sum(nil), hdr.Name)
    }
    fmt.Printf("%x  tb.tar
", tarMd5.Sum(nil))
    fmt.Printf("%x  tb.tar.gz
", gzMd5.Sum(nil))
}

Now for the following example:

$ echo "a" > a.txt
$ echo "b" > b.txt
$ tar cf tb.tar a.txt b.txt 
$ gzip -c tb.tar > tb.tar.gz
$ md5sum a.txt b.txt tb.tar tb.tar.gz

60b725f10c9c85c70d97880dfe8191b3  a.txt
3b5d5c3712955042212316173ccf37be  b.txt
501352dcd8fbd0b8e3e887f7dafd9392  tb.tar
90d6ba204493d8e54d3b3b155bb7f370  tb.tar.gz

On Linux Mint 14 (based on Ubuntu 12.04) with go 1.02 from the Ubuntu repositories the result for my go program is:

$ go run tarmd5.go 
60b725f10c9c85c70d97880dfe8191b3  a.txt
3b5d5c3712955042212316173ccf37be  b.txt
a26ddab1c324780ccb5199ef4dc38691  tb.tar
90d6ba204493d8e54d3b3b155bb7f370  tb.tar.gz

So all hashes except for tb.tar are as expected. (Of course if you retry that example your .tar and .tar.gz will be different from this, because of different timestamps)

Any hint about how to get it work would be greatly appreciated, I really would prefer to have it in 1 run though (with the TeeReaders).

展开全部

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doushui20090526 2013-03-02 14:51
关注
The issue occurs because tar doesn't read every byte from your reader. After hashing each file, you need to empty the reader to ensure every byte is read and hashed. The way I normally do this is use io.Copy() to read until EOF.

package main import ( "archive/tar" "compress/gzip" "crypto/md5" "fmt" "io" "io/ioutil" "os" ) func main() { tgz, _ := os.Open("tb.tar.gz") gzMd5 := md5.New() gz, _ := gzip.NewReader(io.TeeReader(tgz, gzMd5)) tarMd5 := md5.New() tee := io.TeeReader(gz, tarMd5) // need the reader later tr := tar.NewReader(tee) for { fileMd5 := md5.New() hdr, err := tr.Next() if err == io.EOF { break } io.Copy(fileMd5, tr) fmt.Printf("%x %s ", fileMd5.Sum(nil), hdr.Name) } io.Copy(ioutil.Discard, tee) // read unused portions of the tar file fmt.Printf("%x tb.tar ", tarMd5.Sum(nil)) fmt.Printf("%x tb.tar.gz ", gzMd5.Sum(nil)) }

Another option is to just add io.Copy(tarMd5, gz) before your tarMd5.Sum() call. I think the first way is clearer even if I needed to add/modify four lines instead of one.
展开全部

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报
编辑

预览
轻敲空格完成输入
显示为

卡片

标题

链接
评论

按下Enter换行，Ctrl+Enter发表内容

编辑

预览

报告相同问题？

关注问题

git-2.33.1.tar.gz
2022-07-20 01:41

"git-2.33.1.tar.gz" 是Git的源代码包，版本号为2.33.1，采用tar和gzip两种格式进行压缩。这种类型的文件在Linux和Unix环境中常见，用于分发软件源代码，便于用户编译和安装。 Git的历史始于2005年，由Linus ...
vsftpd-3.0.5.tar.gz
2023-06-20 08:01

【描述】"efbf362a65bec771bc15ad311f5a982e vsftpd-3.0.5.tar.gz" 是该文件的MD5哈希值，它用于验证文件的完整性和未被篡改。MD5是一种广泛使用的加密散列函数，生成的32位十六进制数可以用来检查文件是否在传输...
mariadb-5.5.68.tar.gz
2020-11-20 14:59

这个“mariadb-5.5.68.tar.gz”文件是MariaDB 5.5.68版本的源码压缩包，采用gzip算法进行压缩，后缀为“.tar.gz”，意味着它是一个归档文件，可以包含多个文件和目录。SHA256哈希值“23FF96DB2215D3D2EB...
openssl-3.2.3.tar.gz 【OpenSSL 3.2.3版本源码】
2024-10-09 06:31

OpenSSL支持各种加密算法，如对称加密、非对称加密、哈希函数、数字签名、随机数生成等。它不仅在服务器和客户端之间提供安全通信，还用于其他安全应用，如代码签名、数据加密等。 OpenSSL的3.2.3版本是最新发布的...
centos7.tar.gz
2021-07-31 02:34

在Linux系统中，md5sum或sha256sum等工具可以生成这样的哈希值，以便用户验证下载的文件是否与源文件一致。综上所述，"centos7.tar.gz"压缩包中的内容可能涉及到CentOS7的系统配置、软件源信息以及可能的校验文件...
openssl-3.0.5.tar.gz
2022-08-04 02:32

2. **加密算法支持**：OpenSSL支持各种加密算法，如RSA、DSA、ECC（椭圆曲线加密）、AES（高级加密标准）、DES、3DES等，以及哈希函数MD5、SHA1、SHA256等。 3. **证书管理**：它提供了一套完整的工具来处理X.509...
openssl-1.1.1l.tar.gz
2021-10-05 08:47

标题中的 "openssl-1.1.1l.tar.gz" 指的是 OpenSSL 库的一个特定版本，即 1.1.1l 版本，它被打包成一个 tarball（.tar）文件并进行了 gzip 压缩（.gz）。这个文件通常用于在 Linux 或类 Unix 系统上下载和安装 ...
libgd-2.3.0.tar.gz
2020-11-20 15:04

1. 下载验证：首先，确保从可信源下载libgd-2.3.0.tar.gz文件，并通过SHA1哈希值6F9E54998B6FB1ADE64934AB5F98E255BC2CA81A进行校验，确保文件完整无损。 2. 解压文件：使用tar命令解压下载的压缩包，例如`tar -...
openssl-1.1.1k.tar.gz
2021-03-29 16:12

2. **加密算法**：OpenSSL支持大量的加密算法，包括对称加密（如AES、DES）、非对称加密（RSA、DSA、ECC）、哈希函数（MD5、SHA1、SHA256等）和消息认证码（HMAC）。用户可以根据需求选择合适的算法组合来保护数据。...
Digest-Perl-MD5-1.9.tar.gz
2021-11-15 14:30

总的来说，Digest-Perl-MD5为Perl开发者提供了一个强大且方便的工具，用于处理MD5哈希运算，无论是在数据验证、日志分析还是其它需要散列计算的场景中，都能发挥重要作用。然而，鉴于MD5的安全性问题，对于安全敏感...
没有解决我的问题, 去提问

为什么通过TeeReader的tar.gz的tar部分的md5哈希错误？

1条回答 默认 最新

1条回答默认最新