dongyue5686 2012-04-29 22:58
浏览 24
已采纳

前往:重用地图键时大量使用内存

As part of my Go tutorial, I'm writing simple program counting words across multiple files. I have a few go routines for processing files and creating map[string]int telling how many occurrence of particular word have been found. The map is then being sent to reducing routine, which aggregates values to a single map. Sounds pretty straightforward and looks like a perfect (map-reduce) task for Go!

I have around 10k document with 1.6 million unique words. What I found is my memory usage is growing fast and constantly while running the code and I'm running out of memory at about half way of processing (12GB box, 7GB free). So yes, it uses gigabytes for this small data set!

Trying to figure out where the problem lies, I found the reducer collecting and aggregating data is to blame. Here comes the code:

func reduceWords (input chan map[string]int, output chan int) {
  total := make(map[string]int)
  for wordMap := range input {
    for w, c := range wordMap {
      total[w] += c
    }
  }      
  output <- len(total)
}

If I remove the map from the sample above the memory stays within reasonable limits (a few hundred megabytes). What I found though, is taking copy of a string also solves the problem, i.e. the following sample doesn't eat up my memory:

func reduceWords (input chan map[string]int, output chan int) {
  total := make(map[string]int)
  for wordMap := range input {
    for w, c := range wordMap {
      copyW := make([]byte, len(w)) // <-- will put a copy here!
      copy(copyW, w)
      total[string(copyW)] += c
    }
  }  
  output <- len(total)
}

Is it possible it's a wordMap instance not being destructed after every iteration when I use the value directly? (As a C++ programmer I have limited intuition when comes to GC.) Is it desirable behaviour? Am I doing something wrong? Should I be disappointed with Go or rather with myself?

Thanks!

  • 写回答

1条回答 默认 最新

  • dq1685513999 2012-04-29 23:13
    关注

    What does your code look like that turns files into strings? I would look for a problem there. If you are converting large blocks (whole files maybe?) to strings, and then slicing those into words, then you are pinning the entire block if you save any one word. Try keeping the blocks as []byte, slicing those into words, and then converting words to the string type individually.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料