dongluoqiu0255 2017-07-16 01:47
浏览 69
已采纳

了解Linux写入性能

I've been doing some benchmarking to try and understand write performance on Linux, and I don't understand the results I got (I'm using ext4 on Ubuntu 17.04, though I'm more interested in understanding ext4 if anything, than I am in comparing filesystems).

Specifically, I understand that some databases/filesystems work by keeping a stale copy of your data, and then writing updates to a modification log. Periodically, the log is replayed over the stale data to get a fresh version of the data which is then persisted. This only makes sense to me if appending to a file is faster than overwriting the whole file (otherwise why write updates to a log? Why not just overwrite the data on disk?). I was curious how much faster appending is than overwriting, so I wrote a small benchmark in go (https://gist.github.com/msteffen/08267045be42eb40900758c419c3bd38) and got these results:

$ go test ./write_test.go  -bench='.*'
BenchmarkWrite/Write_10_Bytes_10_times-8                30    46189788 ns/op
BenchmarkWrite/Write_100_Bytes_10_times-8               30    46477540 ns/op
BenchmarkWrite/Write_1000_Bytes_10_times-8              30    46214996 ns/op
BenchmarkWrite/Write_10_Bytes_100_times-8                3   458081572 ns/op
BenchmarkWrite/Write_100_Bytes_100_times-8               3   678916489 ns/op
BenchmarkWrite/Write_1000_Bytes_100_times-8              3   448888734 ns/op
BenchmarkWrite/Write_10_Bytes_1000_times-8               1  4579554906 ns/op
BenchmarkWrite/Write_100_Bytes_1000_times-8              1  4436367852 ns/op
BenchmarkWrite/Write_1000_Bytes_1000_times-8             1  4515641735 ns/op
BenchmarkAppend/Append_10_Bytes_10_times-8              30    43790244 ns/op
BenchmarkAppend/Append_100_Bytes_10_times-8             30    44581063 ns/op
BenchmarkAppend/Append_1000_Bytes_10_times-8            30    46399849 ns/op
BenchmarkAppend/Append_10_Bytes_100_times-8              3   452417883 ns/op
BenchmarkAppend/Append_100_Bytes_100_times-8             3   458258083 ns/op
BenchmarkAppend/Append_1000_Bytes_100_times-8            3   452616573 ns/op
BenchmarkAppend/Append_10_Bytes_1000_times-8             1  4504030390 ns/op
BenchmarkAppend/Append_100_Bytes_1000_times-8            1  4591249445 ns/op
BenchmarkAppend/Append_1000_Bytes_1000_times-8           1  4522205630 ns/op
PASS
ok    command-line-arguments  52.681s

This left me with two questions that I couldn't think of an answer to:

1) Why does time per operation go up so much when I go from 100 writes to 1000? (I know Go repeats benchmarks for me, so doing multiple writes myself is probably silly, but since I got a weird answer I'd like to understand why) This was due to a bug in the Go test (which is now fixed)

2) Why isn't appending to a file faster than writing to it? I thought the whole point of the update log was to take advantage of the comparative speed of appends? (note that the current bench calls Sync() after every write, but even if I don't do that appends are no faster than writes, though both are much faster overall)

If any of the experts here could enlighten me, I would really appreciate it! Thanks!

  • 写回答

1条回答 默认 最新

  • dongshungai4857 2017-07-16 02:58
    关注

    About (1), I think the issue is related to your benchmarks not doing what the Go tools expect them to do.

    From the documentation (https://golang.org/pkg/testing/#hdr-Benchmarks):

    The benchmark function must run the target code b.N times. During benchmark execution, b.N is adjusted until the benchmark function lasts long enough to be timed reliably.

    I don't see your code using b.N, so while the benchmark tool thinks you run the code b.N times, you are managing the repeats by yourself. Depending on the values the tools are actually using for b.N, the results will vary unexpectedly.

    You can actually do things 10, 100 and 1,000 times, but in all cases do them b.N times (make that b.N * 10, b.N * 100, etc) so that the reported benchmark is adjusted properly.

    About (2), when some systems rather use a sequential log to store operations to the replay them, it's not because appending to a file is faster than overwriting a single file.

    In a database system, if you need to update a specific record, you must first find what's the actual file (and position in the file) you need to update.

    That might require several index lookups, and once you update the record, you might need to update those indexes to reflect the new values.

    So the right comparison is appending to a single log vs making several reads plus then several writes.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 微信公众号自制会员卡没有收款渠道啊
  • ¥15 stable diffusion
  • ¥100 Jenkins自动化部署—悬赏100元
  • ¥15 关于#python#的问题:求帮写python代码
  • ¥20 MATLAB画图图形出现上下震荡的线条
  • ¥15 关于#windows#的问题:怎么用WIN 11系统的电脑 克隆WIN NT3.51-4.0系统的硬盘
  • ¥15 perl MISA分析p3_in脚本出错
  • ¥15 k8s部署jupyterlab,jupyterlab保存不了文件
  • ¥15 ubuntu虚拟机打包apk错误
  • ¥199 rust编程架构设计的方案 有偿