2009-05-30 13:50
浏览 169


I'm creating something that includes a file upload service of sorts, and I need to store data compressed with zlib's compress() function. I send it across the internet already compressed, but I need to know the uncompressed file size on the remote server. Is there any way I can figure out this information without uncompress()ing the data on the server first, just for efficiency? That's how I'm doing it now, but if there's a shortcut I'd love to take it.

By the way, why is it called uncompress? That sounds pretty terrible to me, I always thought it would be decompress...

图片转代码服务由CSDN问答提供 功能建议

我正在创建包含各种文件上传服务的东西,我需要存储使用zlib压缩的数据压缩 ()函数。 我通过互联网发送它已经压缩,但我需要知道远程服务器上未压缩的文件大小。 有没有办法在没有首先解压缩服务器上的数据的情况下找出这些信息,只是为了提高效率? 这就是我现在正在做的事情,但是如果有一条捷径我很乐意接受它。

顺便说一下,为什么它被称为uncompress? 这对我来说听起来很糟糕,我一直以为它会解压缩......

  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

3条回答 默认 最新

  • drcmue4619 2009-05-30 15:19

    The zlib format doesn't have a field for the original input size, so I doubt you will be able to do that without simulating a decompression of the data. The gzip format has a "input size" (ISIZE) field, that you could use, but maybe you want to avoid changing the compression format or having the clients sending the file size.

    But even if you use a different format, if you don't trust the clients you would still need to run a more expensive check to make sure the uncompressed data is the size the client says it is. In this case, what you can do is to make the uncompress-to-/dev/null process less expensive, making sure zlib doesn't write the output data anywhere, as you just want to know the uncompressed size.

    打赏 评论
  • duanjian7617 2009-05-30 13:54

    I doubt it. I don't believe this is something the underlying zlib libraries provide from memory (although it's been a good 7 or 8 years since I used it, the up-to-date docs don't seem to indicate this feature has been added).

    One possibility would be to transfer another file which contained the uncompressed size (e.g., transfer both and but that seems fraught with danger, especially if you get the size wrong.

    Another alternative is, if the server uncompressing is time-expensive but doesn't have to be done immediately, to do it in a lower-priority background task (like with nice under Linux). But again, there may be drawbacks if the size checker starts running behind (too many uploads coming in).

    And I tend to think of decompression in terms of "explosive decompression", not a good term to use :-)

    打赏 评论
  • douxiajia6104 2009-05-30 15:30

    If you're uploading using the raw 'compress' format, then you won't have information on the size of the data that's being uploaded. Pax is correct in this regard.
    You can store it as a 4 byte header at the start of the compression buffer - assuming that the file size doesn't exceed 4GB.
    some C code as an example:

     uint8_t *compressBuffer = calloc(bufsize + sizeof (uLongf), 0);
     uLongf compressedSize = bufsize;
     *((uLongf *)compressBuffer) = filesize;
     compress(compressBuffer + sizeof (uLongf), &compressedSize, sourceBuffer, bufsize);

    Then you send the complete compressBuffer of the size compressedSize + sizeof (uLongf). When you receive it on the server side you can use the following code to get the data back:

     // data is in compressBuffer, assume you already know compressed size.
     uLongf originalSize = *((uLongf *)compressBuffer);
     uint8_t *realCompressBuffer = compressBuffer + sizeof (uLongf);

    If you don't trust the client to send the correct size then you will need to perform some sort of uncompressed data check on the server size. The suggestion of using uncompress to /dev/null is a reasonable one.
    If you're uploading a .zip file, it contains a directory which tells you the size of the file when it's uncompressed. This information is built into the file format, again, though this is subject to malicious clients.

    打赏 评论

相关推荐 更多相似问题