doujianglin6704 2019-05-15 13:30
浏览 240

Golang Alloc和HeapAlloc与pprof的巨大差异

I have a Go program that calculates large correlation matrices in memory. To do this I've set up a pipeline of 3 goroutines where the first reads in files, the second calculates the correlation matrix and the last stores the result to disk.

Problem is, when I run the program, the Go runtime allocates ~17GB of memory while a matrix only takes up ~2-3GB. Using runtime.ReadMemStats shows that the program is using ~17GB (and verified by using htop), but pprof only reports about ~2.3GB.

If I look at the mem stats after running one file through the pipeline:

var mem runtime.MemStats
runtime.ReadMemStats(&mem)
fmt.Printf("Total alloc: %d GB
", mem.Alloc/1000/1000/1000)

This shows the total allocation of the program:

Total alloc: 17 GB

However, if I run go tool pprof mem.prof I get the following results:

(pprof) top5
Showing nodes accounting for 2.21GB, 100% of 2.21GB total
Showing top 5 nodes out of 9
      flat  flat%   sum%        cum   cum%
    1.20GB 54.07% 54.07%     1.20GB 54.07%  dataset.(*Dataset).CalcCorrelationMatrix
    1.02GB 45.93%   100%     1.02GB 45.93%  bytes.makeSlice
         0     0%   100%     1.02GB 45.93%  bytes.(*Buffer).WriteByte
         0     0%   100%     1.02GB 45.93%  bytes.(*Buffer).grow
         0     0%   100%     1.02GB 45.93%  encoding/json.Indent

So I am wondering how I can go about to find out why the program allocates 17 GB, when it seems that the peak memory usage is only ~2.5GB? Is there a way to trace the memory usage throughout the program using pprof?

EDIT

I ran the program again with GODEBUG=gctrace=1 and got the following trace:

gc 1 @0.017s 0%: 0.005+0.55+0.003 ms clock, 0.022+0/0.47/0.11+0.012 ms cpu, 1227->1227->1226 MB, 1228 MB goal, 4 P
gc 2 @14.849s 0%: 0.003+1.7+0.004 ms clock, 0.015+0/1.6/0.11+0.018 ms cpu, 1227->1227->1227 MB, 2452 MB goal, 4 P
gc 3 @16.850s 0%: 0.006+60+0.003 ms clock, 0.027+0/0.46/59+0.015 ms cpu, 1876->1876->1712 MB, 2455 MB goal, 4 P
gc 4 @22.861s 0%: 0.005+238+0.003 ms clock, 0.021+0/0.46/237+0.015 ms cpu, 3657->3657->3171 MB, 3658 MB goal, 4 P
gc 5 @30.716s 0%: 0.005+476+0.004 ms clock, 0.022+0/0.44/476+0.017 ms cpu, 5764->5764->5116 MB, 6342 MB goal, 4 P
gc 6 @46.023s 0%: 0.005+949+0.004 ms clock, 0.020+0/0.47/949+0.017 ms cpu, 10302->10302->9005 MB, 10303 MB goal, 4 P
gc 7 @64.878s 0%: 0.006+382+0.004 ms clock, 0.024+0/0.46/382+0.019 ms cpu, 16548->16548->7728 MB, 18011 MB goal, 4 P
gc 8 @89.774s 0%: 0.86+2805+0.006 ms clock, 3.4+0/24/2784+0.025 ms cpu, 20208->20208->17088 MB, 20209 MB goal, 4 P

So it is quite obvious that the heap grows steadily through the program, but I am not able to pinpoint where. I've profiled memory usage using pprof.WriteHeapProfile after calling the memory intensive functions:

func memoryProfile(profpath string) {

    if _, err := os.Stat(profpath); os.IsNotExist(err) {
        os.Mkdir(profpath, os.ModePerm)
    }

    f, err := os.Create(path.Join(profpath, "mem.mprof"))
    fmt.Printf("Creating memory profile in %s", "data/profile/mem.mprof
")
    if err != nil {
        panic(err)
    }

    if err := pprof.WriteHeapProfile(f); err != nil {
        panic(err)
    }
    f.Close()
}
  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 飞机曲面部件如机翼,壁板等具体的孔位模型
    • ¥15 vs2019中数据导出问题
    • ¥20 云服务Linux系统TCP-MSS值修改?
    • ¥20 关于#单片机#的问题:项目:使用模拟iic与ov2640通讯环境:F407问题:读取的ID号总是0xff,自己调了调发现在读从机数据时,SDA线上并未有信号变化(语言-c语言)
    • ¥20 怎么在stm32门禁成品上增加查询记录功能
    • ¥15 Source insight编写代码后使用CCS5.2版本import之后,代码跳到注释行里面
    • ¥50 NT4.0系统 STOP:0X0000007B
    • ¥15 想问一下stata17中这段代码哪里有问题呀
    • ¥15 flink cdc无法实时同步mysql数据
    • ¥100 有人会搭建GPT-J-6B框架吗?有偿