dou4121 2013-03-02 12:37
浏览 64
已采纳

为什么通过TeeReader的tar.gz的tar部分的md5哈希错误?

I was just experimenting with archive/tar and compress/gzip, for automated processing of some backups I have.

My problem hereby is: I have various .tar files and .tar.gz files floating around, and thus I want to extract the hash (md5) of the .tar.gz file, and the hash (md5) of the .tar file as well, ideally in one run.

The example code I have so far, works perfectly fine for the hashes of the files in the .tar.gz as well for the .gz, but the hash for the .tar is wrong and I can't find out what the problem is.

I looked at the tar/reader.go file and I saw that there is some skipping in there, yet I thought everything should run over the io.Reader interface and thus the TeeReader should still catch all the bytes.

package main

import (
    "archive/tar"
    "compress/gzip"
    "crypto/md5"
    "fmt"
    "io"
    "os"
)

func main() {
    tgz, _ := os.Open("tb.tar.gz")
    gzMd5 := md5.New()
    gz, _ := gzip.NewReader(io.TeeReader(tgz, gzMd5))
    tarMd5 := md5.New()
    tr := tar.NewReader(io.TeeReader(gz, tarMd5))
    for {
        fileMd5 := md5.New()
        hdr, err := tr.Next()
        if err == io.EOF {
            break
        }
        io.Copy(fileMd5, tr)
        fmt.Printf("%x  %s
", fileMd5.Sum(nil), hdr.Name)
    }
    fmt.Printf("%x  tb.tar
", tarMd5.Sum(nil))
    fmt.Printf("%x  tb.tar.gz
", gzMd5.Sum(nil))
}

Now for the following example:

$ echo "a" > a.txt
$ echo "b" > b.txt
$ tar cf tb.tar a.txt b.txt 
$ gzip -c tb.tar > tb.tar.gz
$ md5sum a.txt b.txt tb.tar tb.tar.gz

60b725f10c9c85c70d97880dfe8191b3  a.txt
3b5d5c3712955042212316173ccf37be  b.txt
501352dcd8fbd0b8e3e887f7dafd9392  tb.tar
90d6ba204493d8e54d3b3b155bb7f370  tb.tar.gz

On Linux Mint 14 (based on Ubuntu 12.04) with go 1.02 from the Ubuntu repositories the result for my go program is:

$ go run tarmd5.go 
60b725f10c9c85c70d97880dfe8191b3  a.txt
3b5d5c3712955042212316173ccf37be  b.txt
a26ddab1c324780ccb5199ef4dc38691  tb.tar
90d6ba204493d8e54d3b3b155bb7f370  tb.tar.gz

So all hashes except for tb.tar are as expected. (Of course if you retry that example your .tar and .tar.gz will be different from this, because of different timestamps)

Any hint about how to get it work would be greatly appreciated, I really would prefer to have it in 1 run though (with the TeeReaders).

展开全部

  • 写回答

1条回答 默认 最新

  • doushui20090526 2013-03-02 14:51
    关注

    The issue occurs because tar doesn't read every byte from your reader. After hashing each file, you need to empty the reader to ensure every byte is read and hashed. The way I normally do this is use io.Copy() to read until EOF.

    package main
    
    import (
        "archive/tar"
        "compress/gzip"
        "crypto/md5"
        "fmt"
        "io"
        "io/ioutil"
        "os"
    )
    
    func main() {
        tgz, _ := os.Open("tb.tar.gz")
        gzMd5 := md5.New()
        gz, _ := gzip.NewReader(io.TeeReader(tgz, gzMd5))
        tarMd5 := md5.New()
        tee := io.TeeReader(gz, tarMd5) // need the reader later
        tr := tar.NewReader(tee)
        for {
            fileMd5 := md5.New()
            hdr, err := tr.Next()
            if err == io.EOF {
                break
            }
            io.Copy(fileMd5, tr)
            fmt.Printf("%x  %s
    ", fileMd5.Sum(nil), hdr.Name)
        }
        io.Copy(ioutil.Discard, tee) // read unused portions of the tar file
        fmt.Printf("%x  tb.tar
    ", tarMd5.Sum(nil))
        fmt.Printf("%x  tb.tar.gz
    ", gzMd5.Sum(nil))
    }
    

    Another option is to just add io.Copy(tarMd5, gz) before your tarMd5.Sum() call. I think the first way is clearer even if I needed to add/modify four lines instead of one.

    展开全部

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
编辑
预览

报告相同问题?

手机看
程序员都在用的中文IT技术交流社区

程序员都在用的中文IT技术交流社区

专业的中文 IT 技术社区,与千万技术人共成长

专业的中文 IT 技术社区,与千万技术人共成长

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

客服 返回
顶部