dongqiancui9194 2017-04-04 05:44
浏览 53
已采纳

使用Gob以附加样式将日志写入文件

Would it be possible to use Gob encoding for appending structs in series to the same file using append? It works for writing, but when reading with the decoder more than once I run into:

extra data in buffer

So I wonder if that's possible in the first place or whether I should use something like JSON to append JSON documents on a per line basis instead. Because the alternative would be to serialize a slice, but then again reading it as a whole would defeat the purpose of append.

  • 写回答

1条回答 默认 最新

  • drno94939847 2017-04-04 06:30
    关注

    The gob package wasn't designed to be used this way. A gob stream has to be written by a single gob.Encoder, and it also has to be read by a single gob.Decoder.

    The reason for this is because the gob package not only serializes the values you pass to it, it also transmits data to describe their types:

    A stream of gobs is self-describing. Each data item in the stream is preceded by a specification of its type, expressed in terms of a small set of predefined types.

    This is a state of the encoder / decoder–about what types and how they have been transmitted–, a subsequent new encoder / decoder will not (cannot) analyze the "preceeding" stream to reconstruct the same state and continue where a previous encoder / decoder left off.

    Of course if you create a single gob.Encoder, you may use it to serialize as many values as you'd like to.

    Also you can create a gob.Encoder and write to a file, and then later create a new gob.Encoder, and append to the same file, but you must use 2 gob.Decoders to read those values, exactly matching the encoding process.

    As a demonstration, let's follow an example. This example will write to an in-memory buffer (bytes.Buffer). 2 subsequent encoders will write to it, then we will use 2 subsequent decoders to read the values. We'll write values of this struct:

    type Point struct {
        X, Y int
    }
    

    For short, compact code, I use this "error handler" function:

    func he(err error) {
        if err != nil {
            panic(err)
        }
    }
    

    And now the code:

    const n, m = 3, 2
    buf := &bytes.Buffer{}
    
    e := gob.NewEncoder(buf)
    for i := 0; i < n; i++ {
        he(e.Encode(&Point{X: i, Y: i * 2}))
    }
    
    e = gob.NewEncoder(buf)
    for i := 0; i < m; i++ {
        he(e.Encode(&Point{X: i, Y: 10 + i}))
    }
    
    d := gob.NewDecoder(buf)
    for i := 0; i < n; i++ {
        var p *Point
        he(d.Decode(&p))
        fmt.Println(p)
    }
    
    d = gob.NewDecoder(buf)
    for i := 0; i < m; i++ {
        var p *Point
        he(d.Decode(&p))
        fmt.Println(p)
    }
    

    Output (try it on the Go Playground):

    &{0 0}
    &{1 2}
    &{2 4}
    &{0 10}
    &{1 11}
    

    Note that if we'd use only 1 decoder to read all the values (looping until i < n + m, we'd get the same error message you posted in your question when the iteration reaches n + 1, because the subsequent data is not a serialized Point, but the start of a new gob stream.

    So if you want to stick with the gob package for doing what you want to do, you have to slightly modify, enhance your encoding / decoding process. You have to somehow mark the boundaries when a new encoder is used (so when decoding, you'll know you have to create a new decoder to read subsequent values).

    You may use different techniques to achieve this:

    • You may write out a number, a count before you proceed to write values, and this number would tell how many values were written using the current encoder.
    • If you don't want to or can't tell how many values will be written with the current encoder, you may opt to write out a special end-of-encoder value when you don't write more values with the current encoder. When decoding, if you encounter this special end-of-encoder value, you'll know you have to create a new decoder to be able to read more values.

    Some things to note here:

    • The gob package is most efficient, most compact if only a single encoder is used, because each time you create and use a new encoder, the type specifications will have to be re-transmitted, causing more overhead, and making the encoding / decoding process slower.
    • You can't seek in the data stream, you can only decode any value if you read the whole file from the beginning up until the value you want. Note that this somewhat applies even if you use other formats (such as JSON or XML).

    If you want seeking functionality, you'd need to manage an index file separately, which would tell at which positions new encoders / decoders start, so you could seek to that position, create a new decoder, and start reading values from there.

    Check a related question: Efficient Go serialization of struct to disk

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 腾讯企业邮箱邮件可以恢复么
  • ¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗?
  • ¥15 错误 LNK2001 无法解析的外部符号
  • ¥50 安装pyaudiokits失败
  • ¥15 计组这些题应该咋做呀
  • ¥60 更换迈创SOL6M4AE卡的时候,驱动要重新装才能使用,怎么解决?
  • ¥15 让node服务器有自动加载文件的功能
  • ¥15 jmeter脚本回放有的是对的有的是错的
  • ¥15 r语言蛋白组学相关问题
  • ¥15 Python时间序列如何拟合疏系数模型