dongluoqiu0255 2017-07-16 01:47
浏览 69
已采纳

了解Linux写入性能

I've been doing some benchmarking to try and understand write performance on Linux, and I don't understand the results I got (I'm using ext4 on Ubuntu 17.04, though I'm more interested in understanding ext4 if anything, than I am in comparing filesystems).

Specifically, I understand that some databases/filesystems work by keeping a stale copy of your data, and then writing updates to a modification log. Periodically, the log is replayed over the stale data to get a fresh version of the data which is then persisted. This only makes sense to me if appending to a file is faster than overwriting the whole file (otherwise why write updates to a log? Why not just overwrite the data on disk?). I was curious how much faster appending is than overwriting, so I wrote a small benchmark in go (https://gist.github.com/msteffen/08267045be42eb40900758c419c3bd38) and got these results:

$ go test ./write_test.go  -bench='.*'
BenchmarkWrite/Write_10_Bytes_10_times-8                30    46189788 ns/op
BenchmarkWrite/Write_100_Bytes_10_times-8               30    46477540 ns/op
BenchmarkWrite/Write_1000_Bytes_10_times-8              30    46214996 ns/op
BenchmarkWrite/Write_10_Bytes_100_times-8                3   458081572 ns/op
BenchmarkWrite/Write_100_Bytes_100_times-8               3   678916489 ns/op
BenchmarkWrite/Write_1000_Bytes_100_times-8              3   448888734 ns/op
BenchmarkWrite/Write_10_Bytes_1000_times-8               1  4579554906 ns/op
BenchmarkWrite/Write_100_Bytes_1000_times-8              1  4436367852 ns/op
BenchmarkWrite/Write_1000_Bytes_1000_times-8             1  4515641735 ns/op
BenchmarkAppend/Append_10_Bytes_10_times-8              30    43790244 ns/op
BenchmarkAppend/Append_100_Bytes_10_times-8             30    44581063 ns/op
BenchmarkAppend/Append_1000_Bytes_10_times-8            30    46399849 ns/op
BenchmarkAppend/Append_10_Bytes_100_times-8              3   452417883 ns/op
BenchmarkAppend/Append_100_Bytes_100_times-8             3   458258083 ns/op
BenchmarkAppend/Append_1000_Bytes_100_times-8            3   452616573 ns/op
BenchmarkAppend/Append_10_Bytes_1000_times-8             1  4504030390 ns/op
BenchmarkAppend/Append_100_Bytes_1000_times-8            1  4591249445 ns/op
BenchmarkAppend/Append_1000_Bytes_1000_times-8           1  4522205630 ns/op
PASS
ok    command-line-arguments  52.681s

This left me with two questions that I couldn't think of an answer to:

1) Why does time per operation go up so much when I go from 100 writes to 1000? (I know Go repeats benchmarks for me, so doing multiple writes myself is probably silly, but since I got a weird answer I'd like to understand why) This was due to a bug in the Go test (which is now fixed)

2) Why isn't appending to a file faster than writing to it? I thought the whole point of the update log was to take advantage of the comparative speed of appends? (note that the current bench calls Sync() after every write, but even if I don't do that appends are no faster than writes, though both are much faster overall)

If any of the experts here could enlighten me, I would really appreciate it! Thanks!

  • 写回答

1条回答 默认 最新

  • dongshungai4857 2017-07-16 02:58
    关注

    About (1), I think the issue is related to your benchmarks not doing what the Go tools expect them to do.

    From the documentation (https://golang.org/pkg/testing/#hdr-Benchmarks):

    The benchmark function must run the target code b.N times. During benchmark execution, b.N is adjusted until the benchmark function lasts long enough to be timed reliably.

    I don't see your code using b.N, so while the benchmark tool thinks you run the code b.N times, you are managing the repeats by yourself. Depending on the values the tools are actually using for b.N, the results will vary unexpectedly.

    You can actually do things 10, 100 and 1,000 times, but in all cases do them b.N times (make that b.N * 10, b.N * 100, etc) so that the reported benchmark is adjusted properly.

    About (2), when some systems rather use a sequential log to store operations to the replay them, it's not because appending to a file is faster than overwriting a single file.

    In a database system, if you need to update a specific record, you must first find what's the actual file (and position in the file) you need to update.

    That might require several index lookups, and once you update the record, you might need to update those indexes to reflect the new values.

    So the right comparison is appending to a single log vs making several reads plus then several writes.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么