转到：内存使用过多，内存泄漏

I am very, very memory careful as I have to write programs that need to cope with massive datasets.

Currently my application quickly reaches 32GB of memory, starts swapping, and then gets killed by the system.

I do not understand how this can be since all variables are collectable (in functions and quickly released) except TokensStruct and TokensCount in the Trainer struct. TokensCount is just a uint. TokensStruct is a 1,000,000 row slice of [5]uint32 and string, so that means 20 bytes + string, which we could call a maximum of 50 bytes per record. 50*1000000 = 50MB of memory required. So this script should therefore not use much more than 50MB + overhead + temporary collectable variables in the functions (maybe another 50MB max.) The maximum potential size of TokensStruct is 5,000,000, as this is the size of dictionary, but even then it would be only 250MB of memory. dictionary is a map and apparently uses around 600MB of memory, as that is how the app starts, but this is not an issue because dictionary is only loaded once and never written to again.

Instead it uses 32GB of memory then dies. By the speed that it does this I expect it would happily get to 1TB of memory if it could. The memory appears to increase in a linear fashion with the size of the files being loaded, meaning that it appears to never clear any memory at all. Everything that enters the app is allocated more memory and memory is never freed.

I tried implementing runtime.GC() in case the garbage collection wasn't running often enough, but this made no difference.

Since the memory usage increases in a linear fashion then this would imply that there is a memory leak in GetTokens() or LoadZip(). I don't know how this could be, since they are both functions and only do one task and then close. Or it could be that the tokens variable in Start() is the cause of the leak. Basically it looks like every file that is loaded and parsed is never released from memory, as that is the only way that the memory could fill up in a linear fashion and keep on rising up to 32GB++.

Absolute nightmare! What's wrong with Go? Any way to fix this?

package main

import (
    "bytes"
    "code.google.com/p/go.text/transform"
    "code.google.com/p/go.text/unicode/norm"
    "compress/zlib"
    "encoding/gob"
    "fmt"
    "github.com/AlasdairF/BinSearch"
    "io/ioutil"
    "os"
    "regexp"
    "runtime"
    "strings"
    "unicode"
    "unicode/utf8"
)

type TokensStruct struct {
    binsearch.Key_string
    Value [][5]uint32
}

type Trainer struct {
    Tokens      TokensStruct
    TokensCount uint
}

func checkErr(err error) {
    if err == nil {
        return
    }
    fmt.Println(`Some Error:`, err)
    panic(err)
}

// Local helper function for normalization of UTF8 strings.
func isMn(r rune) bool {
    return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks
}

// This map is used by RemoveAccents function to convert non-accented characters.
var transliterations = map[rune]string{'Æ': "E", 'Ð': "D", 'Ł': "L", 'Ø': "OE", 'Þ': "Th", 'ß': "ss", 'æ': "e", 'ð': "d", 'ł': "l", 'ø': "oe", 'þ': "th", 'Œ': "OE", 'œ': "oe"}

//  removeAccentsBytes converts accented UTF8 characters into their non-accented equivalents, from a []byte.
func removeAccentsBytesDashes(b []byte) ([]byte, error) {
    mnBuf := make([]byte, len(b))
    t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
    n, _, err := t.Transform(mnBuf, b, true)
    if err != nil {
        return nil, err
    }
    mnBuf = mnBuf[:n]
    tlBuf := bytes.NewBuffer(make([]byte, 0, len(mnBuf)*2))
    for i, w := 0, 0; i < len(mnBuf); i += w {
        r, width := utf8.DecodeRune(mnBuf[i:])
        if r == '-' {
            tlBuf.WriteByte(' ')
        } else {
            if d, ok := transliterations[r]; ok {
                tlBuf.WriteString(d)
            } else {
                tlBuf.WriteRune(r)
            }
        }
        w = width
    }
    return tlBuf.Bytes(), nil
}

func LoadZip(filename string) ([]byte, error) {
    // Open file for reading
    fi, err := os.Open(filename)
    if err != nil {
        return nil, err
    }
    defer fi.Close()
    // Attach ZIP reader
    fz, err := zlib.NewReader(fi)
    if err != nil {
        return nil, err
    }
    defer fz.Close()
    // Pull
    data, err := ioutil.ReadAll(fz)
    if err != nil {
        return nil, err
    }
    return norm.NFC.Bytes(data), nil // return normalized
}

func getTokens(pibn string) []string {
    var data []byte
    var err error
    data, err = LoadZip(`/storedir/` + pibn + `/text.zip`)
    checkErr(err)
    data, err = removeAccentsBytesDashes(data)
    checkErr(err)
    data = bytes.ToLower(data)
    data = reg2.ReplaceAll(data, []byte("$2")) // remove contractions
    data = reg.ReplaceAllLiteral(data, nil)
    tokens := strings.Fields(string(data))
    return tokens
}

func (t *Trainer) Start() {
    data, err := ioutil.ReadFile(`list.txt`)
    checkErr(err)
    pibns := bytes.Fields(data)
    for i, pibn := range pibns {
        tokens := getTokens(string(pibn))
        t.addTokens(tokens)
        if i%100 == 0 {
            runtime.GC() // I added this just to try to stop the memory craziness, but it makes no difference
        }
    }
}

func (t *Trainer) addTokens(tokens []string) {
    for _, tok := range tokens {
        if _, ok := dictionary[tok]; ok {
            if indx, ok2 := t.Tokens.Find(tok); ok2 {
                ar := t.Tokens.Value[indx]
                ar[0]++
                t.Tokens.Value[indx] = ar
                t.TokensCount++
            } else {
                t.Tokens.AddKeyAt(tok, indx)
                t.Tokens.Value = append(t.Tokens.Value, [5]uint32{0, 0, 0, 0, 0})
                copy(t.Tokens.Value[indx+1:], t.Tokens.Value[indx:])
                t.Tokens.Value[indx] = [5]uint32{1, 0, 0, 0, 0}
                t.TokensCount++
            }
        }
    }
    return
}

func LoadDictionary() {
    dictionary = make(map[string]bool)
    data, err := ioutil.ReadFile(`dictionary`)
    checkErr(err)
    words := bytes.Fields(data)
    for _, word := range words {
        strword := string(word)
        dictionary[strword] = false
    }
}

var reg = regexp.MustCompile(`[^a-z0-9\s]`)
var reg2 = regexp.MustCompile(`\b(c|l|all|dall|dell|nell|sull|coll|pell|gl|agl|dagl|degl|negl|sugl|un|m|t|s|v|d|qu|n|j)'([a-z])`) //contractions
var dictionary map[string]bool

func main() {
    trainer := new(Trainer)
    LoadDictionary()
    trainer.Start()
}

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

dongqi8863 2014-08-07 07:25

关注

1 How large are "list.txt" and "dictionary"? If it is so large, No wonder the memory is so large

 pibns := bytes.Fields(data)

how much is len(pibns)?

2 start the gc debug ( do GODEBUG="gctrace=1" ./yourprogram ) to see if there is any gc happening

3 do some profile like this:

    func lookupMem(){
      if f, err := os.Create("mem_prof"+time.Now.Unix()); err != nil {
          log.Debug("record memory profile failed: %v", err)
      } else {
          runtime.GC()
          pprof.WriteHeapProfile(f)                                                                                                                                        
          f.Close()
      }
      if f, err := os.Create("heap_prof" + "." + timestamp); err != nil {
        log.Debug("heap profile failed:", err)
      } else {
        p := pprof.Lookup("heap")
        p.WriteTo(f, 2)
      }
   }

    func (t *Trainer) Start() {      
    .......
      if i%1000==0 {
        //if `len(pibns)` is not very large , record some meminfo
        lookupMem()
      }
    .......

报告相同问题？

关注问题

安卓内存泄漏问题使用内存泄漏检查工具出现问题 android
2018-09-28 01:36

回答 2 已采纳这是LeakCanary内部Bug，分析异常失败了。建议使用最新的LeakCanary版本，在安卓8.0编译才不会有问题
C++多线程内存泄漏问题 c++
2017-03-30 23:31

回答 5 已采纳你检测下其它程序吧，VS2010 无error,无leak ``` /*结构体*/ #include using namespace std; typedef struct
初学者，c++ std::string 内存泄漏如何解决
2016-07-27 12:28

回答 2 已采纳 string在c++中以‘\0’结尾，不是‘\n’ 所以，改为如下代码： ``` void trans(std::string &sss) { for(int i = 0; sss[
Java 理论与实践：用弱引用堵住内存泄漏
2020-12-22 16:55

简介：虽然用 Java:trade_mark: 语言编写的程序在理论上是不会出现“内存泄漏”的，但是有时对象在不再作为程序的逻辑状态的一部分之后仍然不被垃圾收集。本月，负责保障应用程序健康的工程师 Brian Goetz 探讨了无...
python os.system 内存泄漏问题 python
2018-09-30 11:29

回答 1 已采纳我用win7 64位 + Python3测试了一下，没有出现你说的问题 #cdoing = utf-8 import os from memory_profiler import profi
服务器端程序内存泄漏
2018-06-08 06:06

回答 8 已采纳兄弟，我继续这分数，我给你个切实有效的思路。你先使用top 查看内存先分析是java外部内存泄露还是堆内存泄露，外部内存泄露，主要有线程太多占用内存和directByteBuffer占用。
hashmap hashset的内存泄漏问题
2017-03-03 07:36

回答 1 已采纳 http://blog.csdn.net/xingjiyuan26/article/details/49514631
Android内存溢出及内存泄漏原因进解析
2021-01-03 11:24

内存泄漏：当某个对象不再被使用，即不再有变量引用它时，该对象占用的内存就会被系统回收。当某个对象不再被使用，但是在其他对象中仍然有变量引用它时，该对象占用的内存就无法被系统回收，从而导致了内存泄漏。 ...
MFC，串口通信，内存泄露 mfc
2015-06-08 05:43

回答 1 已采纳这不是内存泄漏，而是你没有正确分配指针，或者下标越界，检查下m_input、i data分别是什么
内存泄露问题，出现在定时任务quartz中或者出现在TimerTask 中
2017-07-10 09:26

回答 1 已采纳没有一个能回答对的。很烦很烦
lua闭包upvalue造成的内存泄露，如何释放 lua
2022-10-13 10:21

回答 2 已采纳根据你描述的现象，高频多次循环调用，内存显著变大，这就是内存泄露。关键是找到泄露的点，才好下药。把测试参数调整一下，前后对比测试观察一下
Java调用opencv内存泄漏_内存泄漏使用opencv：VideoCapture
2021-03-18 09:24

马屿人的博客我使用Qt Creator 2.4.1(Qt 4.8.4)和OpenCV 2.4.2开发...它运行良好，但我遇到了内存泄漏：如果我查看任务管理器中消耗的内存，每次读取新图像并最终崩溃时内存会上升。我的主窗口是使用Qt Designer创建的，它是一个...
android 静态方法持有context的内存泄漏 android
2017-08-01 01:41

回答 5 已采纳 activity内存泄漏是指在activity退出将要销毁的时候，还有其他变量引用这acitivty这个实例导致无法销毁，比如你的类A是这样的 ``` public class A{ st
cv::mat 内存泄漏_第1部分：内存泄漏概述
2020-06-28 22:56

cuxiong8996的博客 内存泄漏的代价是巨大的，并且通常直接与生产停机时间或部署进度表延迟有关。不幸的是，适当的测试解决方案的成本也很高，而且客户通常不愿意-或无法-投资必要的资源。需要明确的是，解决内存泄漏的最佳方法是在...
APP内存优化：使用LeakCanary减少内存泄露
2019-09-05 14:31

vchao_的博客使用MAT来分析内存问题，有一些门槛，会有一些难度，并且效率也不是很高，对于一个内存泄漏问题，可能要进行多次排查和对比才能找到问题原因。为了能够简单迅速的发现内存泄漏，Square公司基于MAT开源了LeakCanary ...
没有解决我的问题, 去提问

悬赏问题

¥15 素材场景中光线烘焙后灯光失效
¥15 请教一下各位，为什么我这个没有实现模拟点击
¥15 执行 virtuoso 命令后，界面没有，cadence 启动不起来
¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
¥20 有关区间dp的问题求解
¥15 多电路系统共用电源的串扰问题
¥15 slam rangenet++配置
¥15 有没有研究水声通信方面的帮我改俩matlab代码
¥15 ubuntu子系统密码忘记
¥15 保护模式-系统加载-段寄存器

码龄粉丝数原力等级 --

转到：内存使用过多，内存泄漏

2条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

转到：内存使用过多，内存泄漏

2条回答 默认 最新

悬赏问题

2条回答默认最新