了解Linux写入性能

I've been doing some benchmarking to try and understand write performance on Linux, and I don't understand the results I got (I'm using ext4 on Ubuntu 17.04, though I'm more interested in understanding ext4 if anything, than I am in comparing filesystems).

Specifically, I understand that some databases/filesystems work by keeping a stale copy of your data, and then writing updates to a modification log. Periodically, the log is replayed over the stale data to get a fresh version of the data which is then persisted. This only makes sense to me if appending to a file is faster than overwriting the whole file (otherwise why write updates to a log? Why not just overwrite the data on disk?). I was curious how much faster appending is than overwriting, so I wrote a small benchmark in go (https://gist.github.com/msteffen/08267045be42eb40900758c419c3bd38) and got these results:

$ go test ./write_test.go  -bench='.*'
BenchmarkWrite/Write_10_Bytes_10_times-8                30    46189788 ns/op
BenchmarkWrite/Write_100_Bytes_10_times-8               30    46477540 ns/op
BenchmarkWrite/Write_1000_Bytes_10_times-8              30    46214996 ns/op
BenchmarkWrite/Write_10_Bytes_100_times-8                3   458081572 ns/op
BenchmarkWrite/Write_100_Bytes_100_times-8               3   678916489 ns/op
BenchmarkWrite/Write_1000_Bytes_100_times-8              3   448888734 ns/op
BenchmarkWrite/Write_10_Bytes_1000_times-8               1  4579554906 ns/op
BenchmarkWrite/Write_100_Bytes_1000_times-8              1  4436367852 ns/op
BenchmarkWrite/Write_1000_Bytes_1000_times-8             1  4515641735 ns/op
BenchmarkAppend/Append_10_Bytes_10_times-8              30    43790244 ns/op
BenchmarkAppend/Append_100_Bytes_10_times-8             30    44581063 ns/op
BenchmarkAppend/Append_1000_Bytes_10_times-8            30    46399849 ns/op
BenchmarkAppend/Append_10_Bytes_100_times-8              3   452417883 ns/op
BenchmarkAppend/Append_100_Bytes_100_times-8             3   458258083 ns/op
BenchmarkAppend/Append_1000_Bytes_100_times-8            3   452616573 ns/op
BenchmarkAppend/Append_10_Bytes_1000_times-8             1  4504030390 ns/op
BenchmarkAppend/Append_100_Bytes_1000_times-8            1  4591249445 ns/op
BenchmarkAppend/Append_1000_Bytes_1000_times-8           1  4522205630 ns/op
PASS
ok    command-line-arguments  52.681s

This left me with two questions that I couldn't think of an answer to:

1) Why does time per operation go up so much when I go from 100 writes to 1000? (I know Go repeats benchmarks for me, so doing multiple writes myself is probably silly, but since I got a weird answer I'd like to understand why) This was due to a bug in the Go test (which is now fixed)

2) Why isn't appending to a file faster than writing to it? I thought the whole point of the update log was to take advantage of the comparative speed of appends? (note that the current bench calls Sync() after every write, but even if I don't do that appends are no faster than writes, though both are much faster overall)

If any of the experts here could enlighten me, I would really appreciate it! Thanks!

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongshungai4857 2017-07-16 02:58
关注
About (1), I think the issue is related to your benchmarks not doing what the Go tools expect them to do.

From the documentation (https://golang.org/pkg/testing/#hdr-Benchmarks):

The benchmark function must run the target code b.N times. During benchmark execution, b.N is adjusted until the benchmark function lasts long enough to be timed reliably.

I don't see your code using b.N, so while the benchmark tool thinks you run the code b.N times, you are managing the repeats by yourself. Depending on the values the tools are actually using for b.N, the results will vary unexpectedly.

You can actually do things 10, 100 and 1,000 times, but in all cases do them b.N times (make that b.N * 10, b.N * 100, etc) so that the reported benchmark is adjusted properly.

About (2), when some systems rather use a sequential log to store operations to the replay them, it's not because appending to a file is faster than overwriting a single file.

In a database system, if you need to update a specific record, you must first find what's the actual file (and position in the file) you need to update.

That might require several index lookups, and once you update the record, you might need to update those indexes to reflect the new values.

So the right comparison is appending to a single log vs making several reads plus then several writes.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

了解Linux写入性能 linux
2017-07-16 01:47

回答 1 已采纳 About (1), I think the issue is related to your benchmarks not doing what the Go tools expect them
springboot项目，Linux如何写入静态资源文件 jar linux 后端
2021-08-26 11:28

回答 3 已采纳打包之后，文件路径就变了需要读入流，然后把流写入临时文件 // templateFilePath 文件相对路径 public void setTxtResource(String templateF
关于linux系统服务器的备份问题 centos linux
2021-10-02 19:23

回答 1 已采纳可以考虑做raid
如何优化Linux服务器硬盘性能实用技巧
2020-07-29 13:40

本文为大家讲解如何优化Linux服务器硬盘性能的实用技巧，感性趣的朋友可以看看。
linux目录问题，无法写入数据 linux
2019-08-19 14:23

回答 2 已采纳通常来说，不管目录本身的权限，root本身就有所有权限，可能是存储的问题。你新挂载的存储是什么？用什么参数来挂载的？存储那边本身是否是只读权限？可以排查下这些方面。
Linux里如何查看文件夹大小并且将内容写入txt文件内！ java linux 后端
2021-11-10 11:04

回答 4 已采纳 ls -al 文件名 > 文件名
java web项目 linux 文件传到 Windows服务器的问题 java linux windows
2017-10-27 01:09

回答 2 已采纳不需要，看你们的网络怎么连接的，是否直接相通。然后看你们的安全审计对文件传输的安全性有没有特别的要求。可选的方案有：直接用文件共享，这样传输类似拷贝文件（linux下叫做samba） ftp
Linux服务器性能优化小结
2023-12-18 14:32

shark-chili的博客 服务器平均负载指的是，某段时间里面，那些处于可运行态或者不可中断状态的进程的平均数。可运行态:见下图，操作系统线程状态中，处于RUNNING或者READY状态的进程就是可运行态。不可中断状态:下图中因为I/O而导致...
linux命令问题怎么将语句写入某文件并读出 linux
2022-09-20 10:40

回答 1 已采纳使用cat命令结合重定向>>可以写入内容,用vi命令也可以实现。读取文件内容直接用cat + 要读的文件路径就行这里 << EOF 表示当检测到输入EOF时退出，你可以自己指
MySQL - 写入不同的服务器 mysql php
2015-11-02 09:38

回答 1 已采纳 Short answer: Not likely, and you're probably barking up the wrong tree by trying to keep a connec
在Ubuntu服务器系统中，无法进行DNS设置 linux ubuntu 服务器
2021-12-31 23:35

回答 1 已采纳你是设置DNS，对吗，防火墙也需要设置的
【linux】如何查看服务器磁盘IO性能
2023-12-27 10:14

木头左的博客 dd命令是Linux系统中的一个非常强大的工具，它可以用于复制文件、转换文件格式、备份数据等。dd if=输入文件 of=输出文件 bs=块大小 count=块数其中，if表示输入文件，of表示输出文件，bs表示块大小，count表示块数...
关于正在写入的日志被误删除后，怎么找回或者怎么释放内存？ linux 服务器运维
2022-04-07 09:58

回答 1 已采纳文件系统快满了？什么意思？容量吗？你可以上一下图。清理写入的日志，不影响啊，很多时候会重新生成，再不行，就touch一个，最重要还要设置好权限。
linux服务器性能监控命令汇总（一）
2020-06-05 11:44

慕城南风的博客二、top top命令经常用来监控linux的系统状况，比如cpu、内存的使用，程序员基本都知道这个命令，但比较奇怪的是能用好它的人却很少，例如top监控视图中内存数值的含义就有不少的曲解。本文通过一个运行中的WEB...
Linux高并发服务器开发—项目实战
2022-07-11 09:25

梅山剑客的博客本文主要介绍开发一个基本的高并发服务器需要用到哪些技术栈和技术实现，内容包括：1、unixI\O操作5中工作模式。2、http协议内容的介绍。3、http协议的工作流程。 4、有限状态机编程思想。5、线程池原理的介绍。6、...
没有解决我的问题, 去提问

悬赏问题

¥15 装 pytorch 的时候出了好多问题，遇到这种情况怎么处理？
¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
¥15 手机接入宽带网线，如何释放宽带全部速度
¥30 关于#r语言#的问题：如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
¥15 ETLCloud 处理json多层级问题
¥15 matlab中使用gurobi时报错
¥15 这个主板怎么能扩出一两个sata口
¥15 不是，这到底错哪儿了😭
¥15 2020长安杯与连接网探
¥15 关于#matlab#的问题：在模糊控制器中选出线路信息，在simulink中根据线路信息生成速度时间目标曲线（初速度为20m/s，15秒后减为0的速度时间图像）我想问线路信息是什么

了解Linux写入性能

1条回答 默认 最新

悬赏问题

1条回答默认最新