2014-12-16 12:38
浏览 414


I'm reading in and at the same time parsing (decoding) a file in a custom format, which is compressed with zlib. My question is how can I efficiently uncompress and then parse the uncompressed content without growing the slice? I would like to parse it whilst reading it into a reusable buffer.

This is for a speed-sensitive application and so I'd like to read it in as efficiently as possible. Normally I would just ioutil.ReadAll and then loop again through the data to parse it. This time I'd like to parse it as it's read, without having to grow the buffer into which it is read, for maximum efficiency.

Basically I'm thinking that if I can find a buffer of the perfect size then I can read into this, parse it, and then write over the buffer again, then parse that, etc. The issue here is that the zlib reader appears to read an arbitrary number of bytes each time Read(b) is called; it does not fill the slice. Because of this I don't know what the perfect buffer size would be. I'm concerned that it might break up some of the data that I wrote into two chunks, making it difficult to parse because one say uint64 could be split from into two reads and therefore not occur in the same buffer read - or perhaps that can never happen and it's always read out in chunks of the same size as were originally written?

  1. What is the optimal buffer size, or is there a way to calculate this?
  2. If I have written data into the zlib writer with f.Write(b []byte) is it possible that this same data could be split into two reads when reading back the compressed data (meaning I will have to have a history during parsing), or will it always come back in the same read?

图片转代码服务由CSDN问答提供 功能建议

我正在读取并同时解析(解码)自定义格式的文件,该文件已压缩 与zlib。 我的问题是如何在不增大切片的情况下有效地解压缩然后解析未压缩的内容? 我想在将其读取到可重复使用的缓冲区时进行解析。</ p>

这是针对速度敏感的应用程序,因此我想尽可能高效地读取它。 通常,我只是 ioutil.ReadAll </ code>,然后再次遍历数据以对其进行解析。 这次我想在读取时解析它,而不必增加读取它的缓冲区,以实现最大效率。</ p>

基本上,我在想,如果我能找到 一个大小合适的缓冲区,然后我可以读入它,解析它,然后再次写在缓冲区上,然后解析,依此类推。这里的问题是zlib阅读器似乎每次每次读取任意数量的字节 Read(b)</ code>被调用; 它不会填充切片。 因此,我不知道理想的缓冲区大小是多少。 我担心它可能会将我写入的某些数据分解为两个大块,使其难以解析,因为有人说uint64可以分为两个读取,因此不会在同一缓冲区读取中发生-也许那 </ p>

  1. 最佳缓冲区大小是多少?或者有什么方法可以计算出来? </ li>
  2. 如果我使用 f.Write(b [] byte)</ code>将数据写入zlib写入器,则读取时可能会将同一数据分为两次读取 返回压缩的数据(这意味着我在解析过程中必须具有历史记录),还是总是以相同的读取结果返回?</ li> </ ol> </ div>

2条回答 默认 最新

相关推荐 更多相似问题