drsqpko5286 2014-12-16 12:38 采纳率: 0%
浏览 424
已采纳

在Golang中读取Zlib压缩文件的最有效方法?

I'm reading in and at the same time parsing (decoding) a file in a custom format, which is compressed with zlib. My question is how can I efficiently uncompress and then parse the uncompressed content without growing the slice? I would like to parse it whilst reading it into a reusable buffer.

This is for a speed-sensitive application and so I'd like to read it in as efficiently as possible. Normally I would just ioutil.ReadAll and then loop again through the data to parse it. This time I'd like to parse it as it's read, without having to grow the buffer into which it is read, for maximum efficiency.

Basically I'm thinking that if I can find a buffer of the perfect size then I can read into this, parse it, and then write over the buffer again, then parse that, etc. The issue here is that the zlib reader appears to read an arbitrary number of bytes each time Read(b) is called; it does not fill the slice. Because of this I don't know what the perfect buffer size would be. I'm concerned that it might break up some of the data that I wrote into two chunks, making it difficult to parse because one say uint64 could be split from into two reads and therefore not occur in the same buffer read - or perhaps that can never happen and it's always read out in chunks of the same size as were originally written?

  1. What is the optimal buffer size, or is there a way to calculate this?
  2. If I have written data into the zlib writer with f.Write(b []byte) is it possible that this same data could be split into two reads when reading back the compressed data (meaning I will have to have a history during parsing), or will it always come back in the same read?
  • 写回答

2条回答 默认 最新

  • douhoujun9304 2014-12-16 17:36
    关注

    OK, so I figured this out in the end using my own implementation of a reader.

    Basically the struct looks like this:

    type reader struct {
     at int
     n int
     f io.ReadCloser
     buf []byte
    }
    

    This can be attached to the zlib reader:

    // Open file for reading
    fi, err := os.Open(filename)
    if err != nil {
        return nil, err
    }
    defer fi.Close()
    // Attach zlib reader
    r := new(reader)
    r.buf = make([]byte, 2048)
    r.f, err = zlib.NewReader(fi)
    if err != nil {
        return nil, err
    }
    defer r.f.Close()
    

    Then x number of bytes can be read straight out of the zlib reader using a function like this:

    mydata := r.readx(10)
    
    func (r *reader) readx(x int) []byte {
        for r.n < x {
            copy(r.buf, r.buf[r.at:r.at+r.n])
            r.at = 0
            m, err := r.f.Read(r.buf[r.n:])
            if err != nil {
                panic(err)
            }
            r.n += m
        }
        tmp := make([]byte, x)
        copy(tmp, r.buf[r.at:r.at+x]) // must be copied to avoid memory leak
        r.at += x
        r.n -= x
        return tmp
    }
    

    Note that I have no need to check for EOF because I my parser should stop itself at the right place.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 请提供一个符合要求的网页链接。
  • ¥20 用HslCommunication 连接欧姆龙 plc有时会连接失败。报异常为“未知错误”
  • ¥15 网络设备配置与管理这个该怎么弄
  • ¥20 机器学习能否像多层线性模型一样处理嵌套数据
  • ¥20 西门子S7-Graph,S7-300,梯形图
  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题
  • ¥15 matlab不知道怎么改,求解答!!
  • ¥15 永磁直线电机的电流环pi调不出来
  • ¥15 用stata实现聚类的代码