2018-03-29 01:16
浏览 453


I have some large files I would like to AES encrypt before sending over the wire or saving to disk. While it seems possible to encrypt streams, there seems to be warnings against doing this and instead people recommend splitting the files into chunks and using GCM or crypto/nacl/secretbox.

Processing streams of data is more difficult due to the authenticity requirement. We can’t encrypt-then-MAC: by it’s nature, we usually don’t know the size of a stream. We can’t send the MAC after the stream is complete, as that usually is indicated by the stream being closed. We can’t decrypt a stream on the fly, because we have to see the entire ciphertext in order to check the MAC. Attempting to secure a stream adds enormous complexity to the problem, with no good answers. The solution is to break the stream into discrete chunks, and treat them as messages.

Files are segmented into 4KiB blocks. Each block gets a fresh random 128 bit IV each time it is modified. A 128-bit authentication tag (GHASH) protects each block from modifications.

If a large amount of data is decrypted it is not always possible to buffer all decrypted data until the authentication tag is verified. Splitting the data into small chunks fixes the problem of deferred authentication checks but introduces a new one. The chunks can be reordered... ...because every chunk is encrypted separately. Therefore the order of the chunks must be encoded somehow into the chunks itself to be able to detect rearranging any number of chunks.

Can anyone with actual cryptography experience point me in the right direction?


I realized after asking this question that there is a difference between simply not being able to fit the whole byte stream into memory (encrypting a 10GB file) and the byte stream also being an unknown length that could continue long past the need for the stream's start to be decoded (an 24-hour live video stream).

I am mostly interested in large blobs where the end of the stream can be reached before the beginning needs to be decoded. In other words, encryption that does not require the whole plaintext/ciphertext to be loaded into memory at the same time.

  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

2条回答 默认 最新

  • du9826 2018-03-29 01:48

    As you've already discovered from your research, there isn't much of an elegant solution for authenticated encryption of large files.

    There are traditionally two ways to approach this problem:

    • Split the file into chunks, encrypt each chunk individually and let each chunk have its own authentication tag. AES-GCM would be the best mode to use for this. This method causes file size bloating proportionate to the size of the file. You'll also need a unique nonce for each chunk. You also need a way to indicate where chunks begin/end.

    • Encrypt using AES-CTR with a buffer, call Hash.Write on an HMAC for each buffer of encrypted data. The benefit of this is that encrypting can be done in one pass. The downside is that decryption requires one pass to validate the HMAC and then another pass to actually decrypt. The upside here is that the file size remains the same, plus roughly ~48 or so bytes for the IV and HMAC result.

    Neither is ideal, but for very large files (~2GB or more), the second option is probably preferred.

    I have included an example of encryption in Go using the second method below. In this scenario, the last 48 bytes are the IV (16 bytes) and the result of the HMAC (32 bytes). Note the HMACing of the IV also.

    const BUFFER_SIZE int = 4096
    const IV_SIZE int = 16
    func encrypt(filePathIn, filePathOut string, keyAes, keyHmac []byte) error {
        inFile, err := os.Open(filePathIn)
        if err != nil { return err }
        defer inFile.Close()
        outFile, err := os.Create(filePathOut)
        if err != nil { return err }
        defer outFile.Close()
        iv := make([]byte, IV_SIZE)
        _, err = rand.Read(iv)
        if err != nil { return err }
        aes, err := aes.NewCipher(keyAes)
        if err != nil { return err }
        ctr := cipher.NewCTR(aes, iv)
        hmac := hmac.New(sha256.New, keyHmac)
        buf := make([]byte, BUFFER_SIZE)
        for {
            n, err := inFile.Read(buf)
            if err != nil && err != io.EOF { return err }
            outBuf := make([]byte, n)
            ctr.XORKeyStream(outBuf, buf[:n])
            if err == io.EOF { break }
        return nil
    打赏 评论
  • douzi115522 2018-03-29 15:48

    Using HMAC after encryption is a valid method. However, HMAC can be pretty slow, especially if SHA-2 is used. You could actually do the same with GMAC, the underlying MAC of GCM. It may be tricky to find an implementation but GMAC is over the ciphertext, so you can simply perform it separately if you really want. There are other methods as well such as Poly1305 with AES as used for TLS 1.2 and 1.3.

    For GCM (or CCM or EAX or any other authenticated cipher) you need to authenticate the order of the chunks. You could do this by creating a separate file encryption key and then using the nonce input (the 12 byte IV) to indicate the number of the chunk. This will solve the storage of the IV and make sure that the chunks are in order. You can generate the file encryption key using a KDF (if you have a unique way to indicate the file) or by wrapping a random key with a master key.

    打赏 评论

相关推荐 更多相似问题