dtsfnyay300457 2018-12-14 20:59
浏览 297

sha256 sum与gzip命令输出不匹配

I am trying to compute the sha256 sum of a gzipped file in Go, but my output does not match that of the gzip command.

I have a function Compress that gzips the contents of an io.Reader, a file in my case.

func Compress(r io.Reader) (io.Reader, error) {
    var buf bytes.Buffer
    zw := gzip.NewWriter(&buf)
    if _, err := io.Copy(zw, r); err != nil {
        return nil, err
    }
    if err := zw.Close(); err != nil {
        return nil, err
    }
    return &buf, nil
}

Then I have a function Sum256 that computes the sha256 sum of a reader.

func Sum256(r io.Reader) (sum []byte, err error) {
    h := sha256.New()
    if _, err := io.Copy(h, r); err != nil {
        return nil, err
    }
    return h.Sum(nil), nil
}

My main function opens a file, gzips it, then computes the sha256 sum of the zipped contents. The problem is that the output does not match that of the gzip command. The input file hello.txt contains a single line with the word hello with no newline at the end.

func main() {
    uncompressed, err := os.Open("hello.txt")
    if err != nil {
        log.Fatal(err)
    }
    defer uncompressed.Close()

    sum, err := Sum256(uncompressed)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("%x  %s
", sum, uncompressed.Name())

    uncompressed.Seek(0, 0)
    compressed, err := Compress(uncompressed)
    if err != nil {
        log.Fatal(err)
    }

    sum, err = Sum256(compressed)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("%x  %s.gz
", sum, uncompressed.Name())
}

gzip results:

$ sha256sum hello.txt
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824  hello.txt

$ gzip -c hello.txt | sha256sum
809d7f11e97291d06189e82ca09a1a0a4a66a3c85a24ac7ff389ae6fbe02bcce  -

$ gzip -nc hello.txt | sha256sum
f901eda57fd86d4239806fd4b76f64036c1c20711267a7bc776ab2aa45069b2a  -

My program results:

$ go run main.go
# match
2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824  hello.txt
# mismatch
3429ae8bc6346f1e4fb67b7d788f85f4637e726a725cf4b66c521903d0ab3b07  hello.txt.gz

Any idea why the outputs don't match or on how to fix this? I have tried using an io.Pipe, ioutil.TempFile file, and other methods with the same issue.

  • 写回答

1条回答 默认 最新

  • douqie1852 2018-12-14 21:10
    关注

    In particular, note that if you run the command:

    gzip -c hello.txt
    

    The output will contain the filename, hello.txt. You can see this with hexdump:

    $ touch hello.txt; gzip -c hello.txt | hexdump -C
    00000000  1f 8b 08 08 ad 1b 14 5c  00 03 68 65 6c 6c 6f 2e  |.......\..hello.|
    00000010  74 78 74 00 03 00 00 00  00 00 00 00 00 00        |txt...........|
    0000001e
    

    If you just copy data into a Gzip stream in your program, the filename won't be there. So you must get different results, and the SHA-256 sum should be different.

    However, even if you fix this particular defect... you are still not guaranteed to get the same results by running Gzip on the same file.

    If you want the checksum to be the same, run the checksum on the decompressed data instead.

    评论

报告相同问题?

悬赏问题

  • ¥20 ML307A在使用AT命令连接EMQX平台的MQTT时被拒绝
  • ¥20 腾讯企业邮箱邮件可以恢复么
  • ¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗?
  • ¥15 错误 LNK2001 无法解析的外部符号
  • ¥50 安装pyaudiokits失败
  • ¥15 计组这些题应该咋做呀
  • ¥60 更换迈创SOL6M4AE卡的时候,驱动要重新装才能使用,怎么解决?
  • ¥15 让node服务器有自动加载文件的功能
  • ¥15 jmeter脚本回放有的是对的有的是错的
  • ¥15 r语言蛋白组学相关问题