嵌入式键值数据库与每个键仅存储一个文件？

I'm confused about the advantage of embedded key-value databases over the naive solution of just storing one file on disk per key. For example, databases like RocksDB, Badger, SQLite use fancy data structures like B+ trees and LSMs but seem to get roughly the same performance as this simple solution.

For example, Badger (which is the fastest Go embedded db) takes about 800 microseconds to write an entry. In comparison, creating a new file from scratch and writing some data to it takes 150 mics with no optimization.

EDIT: to clarify, here's the simple implementation of a key-value store I'm comparing with the state of the art embedded dbs. Just hash each key to a string filename, and store the associated value as a byte array at that filename. Reads and writes are ~150 mics each, which is faster than Badger for single operations and comparable for batched operations. Furthermore, the disk space is minimal, since we don't store any extra structure besides the actual values.

I must be missing something here, because the solutions people actually use are super fancy and optimized using things like bloom filters and B+ trees.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doutale7115 2018-04-10 04:38
关注
But Badger is not about writing "an" entry:

My writes are really slow. Why?

Are you creating a new transaction for every single key update? This will lead to very low throughput.

To get best write performance, batch up multiple writes inside a transaction using single DB.Update() call.
You could also have multiple such DB.Update() calls being made concurrently from multiple goroutines.

That leads to issue 396:

I was looking for fast storage in Go and so my first try was BoltDB. I need a lot of single-write transactions. Bolt was able to do about 240 rq/s.

I just tested Badger and I got a crazy 10k rq/s. I am just baffled

That is because:

LSM tree has an advantage compared to B+ tree when it comes to writes.
Also, values are stored separately in value log files so writes are much faster.

You can read more about the design here.

One of the main point (hard to replicate with simple read/write of files) is:

Key-Value separation

The major performance cost of LSM-trees is the compaction process. During compactions, multiple files are read into memory, sorted, and written back. Sorting is essential for efficient retrieval, for both key lookups and range iterations. With sorting, the key lookups would only require accessing at most one file per level (excluding level zero, where we’d need to check all the files). Iterations would result in sequential access to multiple files.

Each file is of fixed size, to enhance caching. Values tend to be larger than keys. When you store values along with the keys, the amount of data that needs to be compacted grows significantly.

In Badger, only a pointer to the value in the value log is stored alongside the key. Badger employs delta encoding for keys to reduce the effective size even further. Assuming 16 bytes per key and 16 bytes per value pointer, a single 64MB file can store two million key-value pairs.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

嵌入式键值数据库与每个键仅存储一个文件？ sqlite
2018-04-10 03:23

回答 1 已采纳 But Badger is not about writing "an" entry: My writes are really slow. Why? Are you creat
请问这个嵌入式的代码如何理解呢？ c语言嵌入式实时数据库嵌入式硬件有问必答
2021-11-24 19:47

回答 2 已采纳 ……哪行看不懂？注释不是很全了吗？openorcloseLed(1); 利用ioctl对每个关闭LEDloadbmp()；打开bmp文件，跳过文件头，读取了三通道数据，放到四通道的结构体里面。然后做一
热电阻与热敏电阻不是同一个东西吗？嵌入式硬件
2022-02-08 19:08

回答 2 已采纳将温度转换为电势变化，应用了热电效应→热电偶将温度转换为电阻变化，金属材料为热电阻；半导体材料为热敏电阻温度转为PN结伏安特性曲线变化→集成温度传感器工程上选型参考下表
大数据技术原理与应用第二篇 大数据存储与管理（二）NoSQL数据库和云数据库
2022-11-17 20:53

月望曦的博客 大数据技术原理与应用第二篇 大数据存储与管理中 NoSQL数据库和云数据库 知识点总结与理解
人脑是一个什么类型的处理器？ opencv python 嵌入式硬件
2022-09-07 09:02

回答 2 已采纳 1.世界的频率比你想象的高太多倍了，普朗克时间了解一下2.视神经的频率大概是0.1s，所以人眼会有0.1s的视觉残留，最早的电影用16帧的频率就能让人觉得不闪，后来改成24帧是因为人耳朵更灵敏，加入音
请问想FreeRTOS和UCOS这种嵌入式系统是怎么从每个任务的死循环跳出执行下一个任务的 c语言单片机嵌入式硬件有问必答
2021-12-27 19:20

回答 1 已采纳任务调度方式很多，以时间片轮为例。系统有个tick时钟，tick中断发生的时候，系统的任务调度器就保存当前正在执行的任务的堆栈和程序指针等信息，装入下一个任务的信息，这样就切换到另一个任务了。
一个低级app 文件的问题bin c语言嵌入式硬件
2021-10-27 18:11

回答 1 已采纳肯定只需要bin文件啊，项目文件有什么用，难道单片机自己编译一次？具体操作流程看这个IAP的说明文档，这个毕竟都是自定义的，不是通用的
键值数据库LevelDB的优缺点及性能分析
2021-11-27 20:15

大数据v的博客导读：LevelDB是一种为分布式而生的键-值数据库。作者：廖环宇张仕华来源：大数据DT（ID：hzdashuju）01 LevelDB的特性LevelDB是一个C++语言编写的高效键-...
Verilog如何实现同一个RAM中存放不同的数据？ fpga开发嵌入式硬件开发语言
2022-02-17 14:38

回答 2 已采纳图文看得我有点疑惑，ram的数据位宽明显是14位啊，8位最大数值只能到255。题主你想表达的是不是，一个数据位宽为14bit的ram，8个一组读写（一组数据位宽为14*8=112bit），共2268
一个ros与机械臂的问题人工智能嵌入式硬件语音识别
2022-06-08 11:19

回答 1 已采纳可以参考：https://docs.elephantrobotics.com/docs/gitbook/13-AdvancedKit/13.5-%E5%9B%BE%E5%83%8F%E8%AF%86%
毕业两年后可以学习嵌入式开发吗？ arm 嵌入式实时数据库
2022-08-30 10:36

回答 4 已采纳任何时候都可以自学
键值数据库虽小，Redis却能雄踞数据库排行前十
2021-08-30 19:19

Z1Y492Vn3ZYD9et3B06的博客键值数据库可以说是最简单的数据库了，只能存储成对的键和值，并在知道键时检索值。这样简单的数据库通常不足以满足复杂的应用。但正是这样的简单性，使得键值数据库在某些应用情况下更具有吸引力。例如...
如何为自己制作一个简单的组卷系统 sqlserver 嵌入式实时数据库时序数据库
2021-07-27 18:20

回答 5 已采纳用文件方式管理比较方便！每个试题保存为一个压缩包，名称为[知识点1]-[知识点2]...-序号.zip，编写程序，根据知识点和数量随机选取文件名。然后通过latex技术，将试题生成pdf试卷。
大数据技术原理与应用（第五章 NoSQL数据库）
2022-09-09 10:32

m0_37607242的博客 NoSQL数据库
大数据存储技术（4）—— NoSQL数据库
2023-12-20 12:49

Francek Chen的博客 NoSQL数据库适用于数据模型比较简单、IT系统更强的灵活性、对数据库性能要求较高且不需要高度的数据一致性等场景。本篇文章简单介绍常见的NoSQL数据库类型。
没有解决我的问题, 去提问

悬赏问题

¥20 关于#硬件工程#的问题，请各位专家解答！
¥15 关于#matlab#的问题：期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707，使系统具有较小的超调量
¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
¥30 截图中的mathematics程序转换成matlab
¥15 动力学代码报错，维度不匹配
¥15 Power query添加列问题
¥50 Kubernetes&Fission&Eleasticsearch
¥15 報錯：Person is not mapped，如何解決？
¥15 c++头文件不能识别CDialog
¥15 Excel发现不可读取的内容

嵌入式键值数据库与每个键仅存储一个文件？

1条回答 默认 最新

My writes are really slow. Why?

Key-Value separation

悬赏问题

1条回答默认最新