大内存使用会减慢无关代码

I am maintaing the code for a Go project that reads and writes a lot of data and that has done so successfully for some time. Recently, I made a change: a CSV file with about 2 million records is loaded in a map with struct values at the beginning of the program. This map is only used in part B, but first part A is executed. And this first part already runs noticeably slower than before (processing time is quadrupled). That is very strange since that part of the logic did not change. I have spent a week trying to explain how this can happen. Here are the steps I have taken (when I mention performance, I always refer to part A, which does not include the time to load the data in memory and actually has nothing to do with it):

The program was running on a server inside a Docker container. But I have been able to reproduce it on my laptop without container: the performance indeed decreases compared to when I run it without the data from the file loaded in memory.
The server had a huge amount of RAM. Although obviously more memory is used when the file is loaded, no limits are hit. I also did not see spikes or other strange patterns in memory usage and disk I/O. For these checks, I have used pprof, htop and iotop.
When the data is loaded but then the map set to nil, performance is OK again.
Loading the data in a slice instead of a map reduces the performance decrease from x4 to x2 (but the memory usage is more or less the same as with the map).
This made me wonder whether the map/slice is accessed somewhere in part A, even though it shouldn’t. The map is stored in a field of a struct type. I checked and this struct is always passed by pointer (including all goroutines). Making it a global variable instead of a pointer field did not solve the issue.
There is one dependency outside of the standard library. Is the problem caused by the library? It forces some garbage collects. Disabling this does not make a difference. I found another similar library that is unrelated and using this one as a replacement improves performance, but it still takes longer when the data of the file is loaded.

Here I have plotted the metrics with and without the data in memory:

What could cause this effect or how do I find it out?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doujiang3997 2019-05-27 11:46
关注
So if I get this right, your flow looks something like this:

Read 2 million rows from CSV into map -> struct

Run part A (which doesn't need data from CSV)

Run part B, using data from CSV

Why read the data before you need it, would be the first question, but that's perhaps besides the point.

What is likely is that 2 million structs in a map are routinely being accessed by the garbage collector, actually. Depending on what value GOGC has, the pacer component of the garbage collector is likely to kick in more often as the amount of memory allocated increases. Because this map is set aside for later use, there's nothing for the GC to do, but it's taking up cycles in checking the data regardless. There's a number of things you could do to verify, and account for this behaviour - all of these things should can help you rule out/confirm whether or not garbage collection is slowing you down.

Profile the code (obviously, important for diagnostics) IIRC, the CPU profile shows GC interventions more readily

Try disabling garbage collection (debug.SetGCPercent(-1))

Store the map in a sync.Pool. This is a type designed for you to keep stuff you'll manage manually, and move outside of regular GC cycles.

Only read the CSV when you need to, don't read it before "part A"

Stream the file, instead of reading it in a massive map. 2 million rows, what's the value of reading all of this in memory, rather than reading line by line?
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

文件大小会减慢php响应吗？ php
2013-10-01 18:47

回答 3 已采纳 As others have recommended, it's better to break this file into parts, for ease of maintainability
未使用的mysql连接会减慢脚本吗？ mysql php
2012-01-17 19:17

回答 2 已采纳 No, an unused MySQL connection won't consume much (if any) cpu time, though it will occupy a bit o
即使没有访问，单个大表也会减慢数据库的速度吗？ mysql php
2014-08-03 14:25

回答 2 已采纳 Databases ultimately store data on disk. Database performance is not really affected by disk spac
pprof 内存泄漏_我如何在大型代码库上使用pprof调查Go中的内存泄漏
2020-08-06 16:31

cumian8165的博客 pprof 内存泄漏by Jonathan Levison ... 我如何在大型代码库上使用pprof调查Go中的内存泄漏 (How I investigated memory leaks in Go using pprof on a large codebase) I have been working with Go for the better...
php从数据库获取数据会减慢页面的速度 mysql php
2012-12-06 04:46

回答 1 已采纳 Fetching data from a database will always involve some level of blocking. The question is how much
很多小的sql语句会大大减慢我的网站速度 mysql php sql
2012-04-13 14:13

回答 1 已采纳 That highly depends on the server backing your website. But 25 to 30 queries shouldn't have a majo
PHP会话将ttfb减慢至少30秒 apache php
2017-05-19 19:04

回答 1 已采纳 As read from the above comments it seems like there is further code after the header statement sin
php首页数据大如何优化速度,php代码优化提升速度
2021-03-26 12:05

杏坛小子的博客 1、如果能将类的方法定义成 static，就尽量定义成 static，它的速度会提升将近4倍。2、$row['id'] 的速度是 $row[id] 的7倍。3、echo 比 print 快，并且使用 echo 的多重参数(译注：指用逗号而不是句点)代替字符串...
Tumblr API V2错误代码429：“超出速率限制”您的速率受限，速度减慢。标题信息？ php
2014-07-20 19:52

回答 1 已采纳 Answer provided by Felix Bonkoski: You can only fetch X many followers per minute with the API
Jetbrains PHPStorm TODO减慢了编辑速度 ide intellij-idea php
2013-09-23 23:28

回答 1 已采纳 JetBrains TODO preforms a RegEx search of potentially large amounts of text, so if there is a time
数组在PHP中减慢程序[关闭] php
2012-12-19 17:48

回答 1 已采纳 Look closely at your for loop. $i never changes, so it'll be an infinite loop. You probably want
驱动开发系列07 - 内存映射和DMA
2024-07-20 20:16

黑不溜秋的的博客当我们编写一些更复杂、对性能要求更高的驱动时，本文所涉及的内容就会派上用场。虚拟内存子系统也是 Linux 核心内核中非常有趣的一部分，因此值得一看。本文内容分为三节：第一节涉及 mmap 系统调用的实现，该调用...
教师计算机日常使用维护培训.pptx
2021-10-04 05:48

计算机病毒在发作前可能表现为计算机运行异常，如频繁死机、启动困难、运行速度减慢、内存不足、软件错误、文件改变、磁盘空间减少等。当病毒发作时，可能出现无关提示、音乐播放、图象生成、硬盘灯闪烁、桌面图标...
2-数据结构大汇总（2）1
2022-08-03 11:49

字典是无序的键值对集合，内部的存储顺序与键的插入顺序无关，但查找和插入速度非常快，且不会随键的数量增加而减慢。然而，字典占用的内存较大，因为它们存储了键值对的关系。创建字典有多种方式，如空字典`dict...
python list 内存泄漏_05 python内存泄漏
2021-01-13 08:30

欧大卫的博客 python内存泄露起因内存泄露指由于疏忽或错误造成程序未能释放已经不再使用的内存的情况。内存泄漏并非指内存在物理上的消失，而是应用程序分配某段内存后，由于设计错误，失去了对该段内存的控制，因而造成了内存的...
怎样编写php程序代码,编写PHP代码总结_php
2021-04-30 07:16

weixin_39613712的博客 1- 编写模块化代码良好的php代码应该是模块化代码。PHP的面向对象的编程功能是一些特别强大的工具，可以把你的应用程序分解成函数或方法。你应该尽可能多的从你的应用程序的服务器端分开前端的html/css/JavaScript...
php-提高代码运行效率代码编写总结
2020-05-19 16:34

Mr_jinhua的博客 1、用单引号代替双引号来包含字符串，这样做会更快一些。因为PHP会在双引号包围的字符串中搜寻变量，单引号则不会，注意：只有echo能这么做，它...4、echo 比 print 快，并且使用echo的多重参数(译注：指用逗号而不是.
操作系统——内存
2024-07-01 23:21

凌云行者的博客操作系统——内存的简单介绍
CLR托管代码功能（C#可以开发与平台无关的代码）
2021-05-28 19:22

连月亮都想脱离地球的博客将代码编译为CIL，再用JIT编译器将它编译为本机代码后，CLR的任务尚未全部完成，还需要管理正在执行的用.NET Framework编写的代码（这个执行代码的阶段通常）
还不会整理Python代码中的 import 语句？你需要这个工具！（isort 使用详解）
2021-10-16 20:30

muzing_的博客介绍一个强大实用的 Python 工具 isort，可以把我们代码中的 import 部分分类、排序，实现格式化，优化统一代码风格，利于团队协作。本文应该是目前中文社区中介绍 isort 最全面详细的文章（七千字长文，对相当篇幅...
没有解决我的问题, 去提问

悬赏问题

¥15 做个有关计算的小程序
¥15 MPI读取tif文件无法正常给各进程分配路径
¥15 如何用MATLAB实现以下三个公式（有相互嵌套）
¥30 关于#算法#的问题：运用EViews第九版本进行一系列计量经济学的时间数列数据回归分析预测问题求各位帮我解答一下
¥15 setInterval 页面闪烁，怎么解决
¥15 如何让企业微信机器人实现消息汇总整合
¥50 关于#ui#的问题：做yolov8的ui界面出现的问题
¥15 如何用Python爬取各高校教师公开的教育和工作经历
¥15 TLE9879QXA40 电机驱动
¥20 对于工程问题的非线性数学模型进行线性化

大内存使用会减慢无关代码

1条回答 默认 最新

悬赏问题

1条回答默认最新