dsns47611 2018-04-10 03:23
浏览 100
已采纳

嵌入式键值数据库与每个键仅存储一个文件?

I'm confused about the advantage of embedded key-value databases over the naive solution of just storing one file on disk per key. For example, databases like RocksDB, Badger, SQLite use fancy data structures like B+ trees and LSMs but seem to get roughly the same performance as this simple solution.

For example, Badger (which is the fastest Go embedded db) takes about 800 microseconds to write an entry. In comparison, creating a new file from scratch and writing some data to it takes 150 mics with no optimization.

EDIT: to clarify, here's the simple implementation of a key-value store I'm comparing with the state of the art embedded dbs. Just hash each key to a string filename, and store the associated value as a byte array at that filename. Reads and writes are ~150 mics each, which is faster than Badger for single operations and comparable for batched operations. Furthermore, the disk space is minimal, since we don't store any extra structure besides the actual values.

I must be missing something here, because the solutions people actually use are super fancy and optimized using things like bloom filters and B+ trees.

  • 写回答

1条回答 默认 最新

  • doutale7115 2018-04-10 04:38
    关注

    But Badger is not about writing "an" entry:

    My writes are really slow. Why?

    Are you creating a new transaction for every single key update? This will lead to very low throughput.

    To get best write performance, batch up multiple writes inside a transaction using single DB.Update() call.
    You could also have multiple such DB.Update() calls being made concurrently from multiple goroutines.

    That leads to issue 396:

    I was looking for fast storage in Go and so my first try was BoltDB. I need a lot of single-write transactions. Bolt was able to do about 240 rq/s.

    I just tested Badger and I got a crazy 10k rq/s. I am just baffled

    That is because:

    LSM tree has an advantage compared to B+ tree when it comes to writes.
    Also, values are stored separately in value log files so writes are much faster.

    You can read more about the design here.


    One of the main point (hard to replicate with simple read/write of files) is:

    Key-Value separation

    The major performance cost of LSM-trees is the compaction process. During compactions, multiple files are read into memory, sorted, and written back. Sorting is essential for efficient retrieval, for both key lookups and range iterations. With sorting, the key lookups would only require accessing at most one file per level (excluding level zero, where we’d need to check all the files). Iterations would result in sequential access to multiple files.

    Each file is of fixed size, to enhance caching. Values tend to be larger than keys. When you store values along with the keys, the amount of data that needs to be compacted grows significantly.

    In Badger, only a pointer to the value in the value log is stored alongside the key. Badger employs delta encoding for keys to reduce the effective size even further. Assuming 16 bytes per key and 16 bytes per value pointer, a single 64MB file can store two million key-value pairs.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?
  • ¥15 c++头文件不能识别CDialog
  • ¥15 Excel发现不可读取的内容