dou4121 2013-03-02 20:37
浏览 64
已采纳

为什么通过TeeReader的tar.gz的tar部分的md5哈希错误?

I was just experimenting with archive/tar and compress/gzip, for automated processing of some backups I have.

My problem hereby is: I have various .tar files and .tar.gz files floating around, and thus I want to extract the hash (md5) of the .tar.gz file, and the hash (md5) of the .tar file as well, ideally in one run.

The example code I have so far, works perfectly fine for the hashes of the files in the .tar.gz as well for the .gz, but the hash for the .tar is wrong and I can't find out what the problem is.

I looked at the tar/reader.go file and I saw that there is some skipping in there, yet I thought everything should run over the io.Reader interface and thus the TeeReader should still catch all the bytes.

package main

import (
    "archive/tar"
    "compress/gzip"
    "crypto/md5"
    "fmt"
    "io"
    "os"
)

func main() {
    tgz, _ := os.Open("tb.tar.gz")
    gzMd5 := md5.New()
    gz, _ := gzip.NewReader(io.TeeReader(tgz, gzMd5))
    tarMd5 := md5.New()
    tr := tar.NewReader(io.TeeReader(gz, tarMd5))
    for {
        fileMd5 := md5.New()
        hdr, err := tr.Next()
        if err == io.EOF {
            break
        }
        io.Copy(fileMd5, tr)
        fmt.Printf("%x  %s
", fileMd5.Sum(nil), hdr.Name)
    }
    fmt.Printf("%x  tb.tar
", tarMd5.Sum(nil))
    fmt.Printf("%x  tb.tar.gz
", gzMd5.Sum(nil))
}

Now for the following example:

$ echo "a" > a.txt
$ echo "b" > b.txt
$ tar cf tb.tar a.txt b.txt 
$ gzip -c tb.tar > tb.tar.gz
$ md5sum a.txt b.txt tb.tar tb.tar.gz

60b725f10c9c85c70d97880dfe8191b3  a.txt
3b5d5c3712955042212316173ccf37be  b.txt
501352dcd8fbd0b8e3e887f7dafd9392  tb.tar
90d6ba204493d8e54d3b3b155bb7f370  tb.tar.gz

On Linux Mint 14 (based on Ubuntu 12.04) with go 1.02 from the Ubuntu repositories the result for my go program is:

$ go run tarmd5.go 
60b725f10c9c85c70d97880dfe8191b3  a.txt
3b5d5c3712955042212316173ccf37be  b.txt
a26ddab1c324780ccb5199ef4dc38691  tb.tar
90d6ba204493d8e54d3b3b155bb7f370  tb.tar.gz

So all hashes except for tb.tar are as expected. (Of course if you retry that example your .tar and .tar.gz will be different from this, because of different timestamps)

Any hint about how to get it work would be greatly appreciated, I really would prefer to have it in 1 run though (with the TeeReaders).

  • 写回答

1条回答 默认 最新

  • doushui20090526 2013-03-02 22:51
    关注

    The issue occurs because tar doesn't read every byte from your reader. After hashing each file, you need to empty the reader to ensure every byte is read and hashed. The way I normally do this is use io.Copy() to read until EOF.

    package main
    
    import (
        "archive/tar"
        "compress/gzip"
        "crypto/md5"
        "fmt"
        "io"
        "io/ioutil"
        "os"
    )
    
    func main() {
        tgz, _ := os.Open("tb.tar.gz")
        gzMd5 := md5.New()
        gz, _ := gzip.NewReader(io.TeeReader(tgz, gzMd5))
        tarMd5 := md5.New()
        tee := io.TeeReader(gz, tarMd5) // need the reader later
        tr := tar.NewReader(tee)
        for {
            fileMd5 := md5.New()
            hdr, err := tr.Next()
            if err == io.EOF {
                break
            }
            io.Copy(fileMd5, tr)
            fmt.Printf("%x  %s
    ", fileMd5.Sum(nil), hdr.Name)
        }
        io.Copy(ioutil.Discard, tee) // read unused portions of the tar file
        fmt.Printf("%x  tb.tar
    ", tarMd5.Sum(nil))
        fmt.Printf("%x  tb.tar.gz
    ", gzMd5.Sum(nil))
    }
    

    Another option is to just add io.Copy(tarMd5, gz) before your tarMd5.Sum() call. I think the first way is clearer even if I needed to add/modify four lines instead of one.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 unity第一人称射击小游戏,有demo,在原脚本的基础上进行修改以达到要求
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?
  • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line