dongping4901 2016-04-03 13:26
浏览 35
已采纳

通过附加几次来检索写入文件的料滴

I am trying to use encoding/gob to store data to a file and load it later. I want to be able to append new data to the file and load all saved data later, e.g. after restarting my application. While storing to the file using Encode() there are no problems, but when reading it seems I always get only the item which was first stored, not the succinctly stored items.

Here is a minimal example: https://play.golang.org/p/patGkKDLhM

As you see, it works to write two times to an encoder and then read it back. But when closing the file and reopening it again in append mode, writing seems to work, but reading works only for the first two elements (which have been written previously). The two newly added structs cannot be retrieved, I get the error:

panic: extra data in buffer

I am aware of Append to golang gob in a file on disk and I also read https://groups.google.com/forum/#!topic/golang-nuts/bn6vjC5Abd8

Finally, I also found https://gist.github.com/kjk/8015952 which seems to demonstrate that what I am trying to do does not work. Why? What does this error mean?

  • 写回答

1条回答 默认 最新

  • droos02800 2016-04-03 14:43
    关注

    I have not used the encoding/gob package yet (looks cool, I might have to find a project for it). But reading the godoc, it would seem to me that each encoding is a single record expected to be decoded from beginning to end. That is, once you Encode a stream, the resulting bytes is a complete set respecting the entire stream from start to finish - not able to be appended to later by encoding again.

    The godoc states that an encoded gob is self-descriptive. At the beginning of the encoded stream, it describes the entire data set struct, types, etc that will be following including the field names. Then what follows in the byte stream is the the size and byte representation of the value of those Exported fields.

    Then one could assume that what is omitted from the docs is since the stream self-describes itself at the very beginning, including each field that is about to be passed, that is all that the Decoder will care about. The Decoder will not know of any successive bytes added after what has been described as it only sees what was described at the beginning. Therefore, that error message panic: extra data in buffer is accurate.

    In your Playground example, you are encoding twice to the same encoder instance and then closing the file. Since you are passing exactly two records in, and encoding two records, that may work as the single instance of the encoder may see the two Encode calls as a single encoded stream. Then when you close the file io's stream, the gob is now complete - and the stream is treated as a single record (even though you sent in two types).

    And the same in the decoding function, you are reading X number of times from the same stream. But, you are writing a single record when closing the file - that actually has two types in that one single record. Hence why it works when reading 2, and EXACTLY 2. But fails if reading more than 2.

    A solution, if you want to store this in a single file, is that you will need to create your own index of each complete "write" or encoder instance/session. Some form your own Block method that allows you to wrap or define each entry written to disk with a "begin" and "end" marker. That way, when reading back the file, you know exactly what buffer to allocate because of the begin/end markers. Once you have a single record in a buffer, then you use gob's Decoder to decode it. And close the file after each write.

    The pattern I use for such markers is something like:

    uint64:uint64
    uint64:uint64
    ...
    

    The first being the beginning byte number, and the second entry separated by a colon being its length. I usually store this in another file though, called appropriately indexes. That way it can be quickly read into memory, and then I can stream the large file knowing exactly where each start and end address is in the byte stream.

    Another option is just to store each gob in its own file, using the file system directory structure to organize as you see fit (or one could even use the directories to define types, for example). Then the existence of each file is a single record. This is how I use my rendered json from Event Sourcing techniques, storing millions of files organized in directories.

    In summary, it would seem to me that a gob of data is a complete set of data from beginning to end - a single "record" have you. If you want to store multiple encodings/multiple gobs, then to will need to create your own index to track the start and size/end of each gob bytes as you store them. Then, you will want to Decode each entry separately.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 c语言怎么用printf(“\b \b”)与getch()实现黑框里写入与删除?
  • ¥20 怎么用dlib库的算法识别小麦病虫害
  • ¥15 华为ensp模拟器中S5700交换机在配置过程中老是反复重启
  • ¥15 java写代码遇到问题,求帮助
  • ¥15 uniapp uview http 如何实现统一的请求异常信息提示?
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?