han_mj 2021-06-28 18:04 采纳率: 0%
浏览 28

Python | 手动删除生成的mdb数据库文件后,系统磁盘空间并未释放,这是为什么?

环境:win10 pyhton3.7 lmdb0.9

问题描述:做CRNN实验时,自己制作数据集时,用别人封装好的函数,将图片数据和标签数据存入mdb数据文件中。能正常生成对应mdb数据文件,但是我在删除该文件后,发现磁盘空间未释放。比如,我申请了10GB空间,在将data.mdb和lock.mdb文件删除后,磁盘未恢复10GB的空间,这应该怎么解决?是需要将环境先关闭吗?倘若之前未关闭,文件也已删除,有无解决方法?

代码如下:

43行是申请了15GB的空间

import os
import lmdb # install lmdb by "pip install lmdb"
import cv2
import numpy as np
import chardet


def checkImageIsValid(imageBin):
    if imageBin is None:
        return False
    
    imageBuf = np.fromstring(imageBin, dtype=np.uint8)
    img = cv2.imdecode(imageBuf, cv2.IMREAD_GRAYSCALE)
    
    imgH, imgW = img.shape[0], img.shape[1]
    if imgH * imgW == 0:
        return False
    return True


def writeCache(env, cache):
    with env.begin(write=True) as txn:
        for k, v in cache.items():
            if type(v) == type('1'): 
                v = v.encode()
            txn.put(k.encode(), v)


def createDataset(outputPath, imagePathList, labelList, lexiconList=None, checkValid=True):
    """
    Create LMDB dataset for CRNN training.

    ARGS:
        outputPath    : LMDB output path
        imagePathList : list of image path
        labelList     : list of corresponding groundtruth texts
        lexiconList   : (optional) list of lexicon lists
        checkValid    : if true, check the validity of every image
    """
    assert(len(imagePathList) == len(labelList))
    nSamples = len(imagePathList)
    # 在这里申请了15GB的空间,但是删除相应文件后,空间未释放
    env = lmdb.open(outputPath, map_size=16106127360)
    cache = {}
    cnt = 1
    for i in range(nSamples):
        imagePath = imagePathList[i]
        label = labelList[i]
        if not os.path.exists(imagePath):
            print('%s does not exist' % imagePath)
            continue
        with open(imagePath, 'rb') as f:
            imageBin = f.read()
        if checkValid:
            if not checkImageIsValid(imageBin):
                print('%s is not a valid image' % imagePath)
                continue

        imageKey = 'image-%09d' % cnt
        labelKey = 'label-%09d' % cnt
        cache[imageKey] = imageBin
        cache[labelKey] = label
        if lexiconList:
            lexiconKey = 'lexicon-%09d' % cnt
            cache[lexiconKey] = ' '.join(lexiconList[i])
        if cnt % 1000 == 0:
            writeCache(env, cache)
            cache = {}
            print('Written %d / %d' % (cnt, nSamples))
        cnt += 1
    nSamples = cnt-1
    cache['num-samples'] = str(nSamples)
    writeCache(env, cache)
    print('Created dataset with %d samples' % nSamples)

def check_charset(file_path):
    
    with open(file_path, "rb") as f:
        data = f.read(4)
        charset = chardet.detect(data)['encoding']
    return charset

def getParam():
    outputPath = 'D:/MLWorkspace/crnn-master/crnn-master/data/train_data'
    datasetPath = 'D:/MLWorkspace/dataset/crnn/DataSet/'
    dataTrainTxtPath = datasetPath + 'data_train.txt'
    imagePath = datasetPath + 'Synthetic_Chinese_String_Dataset/images/'
    imagePathList = []
    labelPathList = []
    with open(dataTrainTxtPath) as f:
        lines = f.readlines()
        for i in range(len(lines)):
            arr = lines[i].split()
            filename = arr[0]
            labelArr = arr[1:11]
            label = ''
            for item in labelArr:
                label += item + ','
            imagePathList.append(imagePath + filename)
            labelPathList.append(label)
        return outputPath, imagePathList, labelPathList

if __name__ == '__main__':
    params = getParam()
    createDataset(params[0], params[1], params[2])
    # pass
  • 写回答

1条回答 默认 最新

  • CSDN专家-黄老师 2021-06-28 21:35
    关注

    程序在运行的时候删除文件,这个时候占用的内存不会释放,你只能重新运行程序才行

    评论

报告相同问题?

悬赏问题

  • ¥20 高德地图聚合图层MarkerCluster聚合多个点,但是ClusterData只有其中部分数据,原因应该是有经纬度重合的地方点,现在我想让ClusterData显示所有点的信息,如何实现?
  • ¥100 求Web版SPC控制图程序包调式
  • ¥20 指导如何跑通以下两个Github代码
  • ¥15 大家知道这个后备文件怎么删吗,为啥这些文件我只看到一份,没有后备呀
  • ¥15 C++为什么这个代码没报错运行不出来啊
  • ¥15 一道ban了很多东西的pyjail题
  • ¥15 关于#r语言#的问题:如何将生成的四幅图排在一起,且对变量的赋值进行更改,让组合的图漂亮、美观@(相关搜索:森林图)
  • ¥15 C++识别堆叠物体异常
  • ¥15 微软硬件驱动认证账号申请
  • ¥15 GPT写作提示指令词