Paddleocr:out of memory error on GPU

在公司ubuntu服务器上配置了paddle的环境，训练paddleocr是遇到了Out of memory error on GPU 0. Cannot allocate 28.125000MB memory on GPU 0, 11.598755GB memory has been allocated。。。的问题，网上的主要解决方法减小配置文件中的batchsize，我试过了还是不行，甚至调成1也不行。有大神了解怎么解决吗
两个gpu都没使用

Traceback (most recent call last):
  File "/home/sj/Project/PaddleOCR-main/tools/train.py", line 269, in <module>
    main(config, device, logger, vdl_writer, seed)
  File "/home/sj/Project/PaddleOCR-main/tools/train.py", line 222, in main
    program.train(
  File "/home/sj/Project/PaddleOCR-main/tools/program.py", line 345, in train
    preds = model(images, data=batch[1:])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sj/anaconda3/envs/paddle_env/lib/python3.11/site-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sj/Project/PaddleOCR-main/ppocr/modeling/architectures/base_model.py", line 85, in forward
    x = self.backbone(x)
        ^^^^^^^^^^^^^^^^
  File "/home/sj/anaconda3/envs/paddle_env/lib/python3.11/site-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sj/Project/PaddleOCR-main/ppocr/modeling/backbones/rec_lcnetv3.py", line 544, in forward
    x = self.blocks6(x)
        ^^^^^^^^^^^^^^^
  File "/home/sj/anaconda3/envs/paddle_env/lib/python3.11/site-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sj/anaconda3/envs/paddle_env/lib/python3.11/site-packages/paddle/nn/layer/container.py", line 615, in forward
    input = layer(input)
            ^^^^^^^^^^^^
  File "/home/sj/anaconda3/envs/paddle_env/lib/python3.11/site-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sj/Project/PaddleOCR-main/ppocr/modeling/backbones/rec_lcnetv3.py", line 390, in forward
    x = self.pw_conv(x)
        ^^^^^^^^^^^^^^^
  File "/home/sj/anaconda3/envs/paddle_env/lib/python3.11/site-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sj/Project/PaddleOCR-main/ppocr/modeling/backbones/rec_lcnetv3.py", line 223, in forward
    out += self.identity(x)
MemoryError: 

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::pybind::CallScalarFuction(paddle::Tensor const&, double, std::string)
1   scale_ad_func(paddle::Tensor const&, paddle::experimental::ScalarBase<paddle::Tensor>, paddle::experimental::ScalarBase<paddle::Tensor>, bool)
2   paddle::experimental::scale(paddle::Tensor const&, paddle::experimental::ScalarBase<paddle::Tensor> const&, paddle::experimental::ScalarBase<paddle::Tensor> const&, bool)
3   void phi::ScaleKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, paddle::experimental::ScalarBase<phi::DenseTensor> const&, paddle::experimental::ScalarBase<phi::DenseTensor> const&, bool, phi::DenseTensor*)
4   float* phi::DeviceContext::Alloc<float>(phi::TensorBase*, unsigned long, bool) const
5   phi::DeviceContext::Impl::Alloc(phi::TensorBase*, phi::Place const&, phi::DataType, unsigned long, bool, bool) const
6   phi::DenseTensor::AllocateFrom(phi::Allocator*, phi::DataType, unsigned long, bool)
7   paddle::memory::allocation::Allocator::Allocate(unsigned long)
8   paddle::memory::allocation::StatAllocator::AllocateImpl(unsigned long)
9   paddle::memory::allocation::Allocator::Allocate(unsigned long)
10  paddle::memory::allocation::Allocator::Allocate(unsigned long)
11  paddle::memory::allocation::Allocator::Allocate(unsigned long)
12  paddle::memory::allocation::Allocator::Allocate(unsigned long)
13  paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long)
14  std::string phi::enforce::GetCompleteTraceBackString<std::string >(std::string&&, char const*, int)
15  common::enforce::GetCurrentTraceBackString[abi:cxx11](bool)

----------------------
Error Message Summary:
----------------------
ResourceExhaustedError: 

Out of memory error on GPU 0. Cannot allocate 28.125000MB memory on GPU 0, 11.602661GB memory has been allocated and available memory is only 22.687500MB.

Please check whether there is any other process using GPU 0.
1. If yes, please stop them, or start PaddlePaddle on another GPU.
2. If no, please decrease the batch size of your model. 
 (at ../paddle/fluid/memory/allocation/cuda_allocator.cc:86)

附加一下文字检测模型det和文字识别模型rec的配置文件，出问题的是文字识别模型rec,文字检测模型det可以运行
1.文字检测模型det（可以跑）

Global:
  debug: false
  use_gpu: true
  epoch_num: 800
  log_smooth_window: 20
  print_batch_step: 20
  save_model_dir: ./output/ch_PP-OCRv4
  save_epoch_step: 100
  eval_batch_step:
  - 0
  - 20000
  cal_metric_during_train: false
  checkpoints: null
  pretrained_model: null
  save_inference_dir: null
  use_visualdl: false
  infer_img: doc/imgs_en/img_10.jpg
  save_res_path: ./checkpoints/det_db/predicts_db.txt
  distributed: true
Architecture:
  name: DistillationModel
  algorithm: Distillation
  model_type: det
  Models:
    Student:
      model_type: det
      algorithm: DB
      Transform: null
      Backbone:
        name: PPLCNetV3
        scale: 0.75
        pretrained: false
        det: true
      Neck:
        name: RSEFPN
        out_channels: 96
        shortcut: true
      Head:
        name: DBHead
        k: 50
    Student2:
      pretrained: null
      model_type: det
      algorithm: DB
      Transform: null
      Backbone:
        name: PPLCNetV3
        scale: 0.75
        pretrained: true
        det: true
      Neck:
        name: RSEFPN
        out_channels: 96
        shortcut: true
      Head:
        name: DBHead
        k: 50
    Teacher:
      pretrained: https://paddleocr.bj.bcebos.com/PP-OCRv4/chinese/ch_PP-OCRv4_det_cml_teacher_pretrained/teacher.pdparams
      freeze_params: true
      return_all_feats: false
      model_type: det
      algorithm: DB
      Backbone:
        name: ResNet_vd
        in_channels: 3
        layers: 50
      Neck:
        name: LKPAN
        out_channels: 256
      Head:
        name: DBHead
        kernel_list:
        - 7
        - 2
        - 2
        k: 50
Loss:
  name: CombinedLoss
  loss_config_list:
  - DistillationDilaDBLoss:
      weight: 1.0
      model_name_pairs:
      - - Student
        - Teacher
      - - Student2
        - Teacher
      key: maps
      balance_loss: true
      main_loss_type: DiceLoss
      alpha: 5
      beta: 10
      ohem_ratio: 3
  - DistillationDMLLoss:
      model_name_pairs:
      - Student
      - Student2
      maps_name: thrink_maps
      weight: 1.0
      key: maps
  - DistillationDBLoss:
      weight: 1.0
      model_name_list:
      - Student
      - Student2
      balance_loss: true
      main_loss_type: DiceLoss
      alpha: 5
      beta: 10
      ohem_ratio: 3
Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.001
    warmup_epoch: 2
  regularizer:
    name: L2
    factor: 5.0e-05
PostProcess:
  name: DistillationDBPostProcess
  model_name:
  - Student
  key: head_out
  thresh: 0.3
  box_thresh: 0.6
  max_candidates: 1000
  unclip_ratio: 1.5
Metric:
  name: DistillationMetric
  base_metric_name: DetMetric
  main_indicator: hmean
  key: Student
Train:
  dataset:
    name: SimpleDataSet
    data_dir: /home/sj/Project/PaddleOCR-main/Datasets/det/train/
    label_file_list:
      - /home/sj/Project/PaddleOCR-main/Datasets/det/train.txt
    ratio_list: [1.0]
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - DetLabelEncode: null
    - IaaAugment:
        augmenter_args:
        - type: Fliplr
          args:
            p: 0.5
        - type: Affine
          args:
            rotate:
            - -10
            - 10
        - type: Resize
          args:
            size:
            - 0.5
            - 3
    - EastRandomCropData:
        size:
        - 640
        - 640
        max_tries: 50
        keep_ratio: true
    - MakeBorderMap:
        shrink_ratio: 0.4
        thresh_min: 0.3
        thresh_max: 0.7
        total_epoch: 500
    - MakeShrinkMap:
        shrink_ratio: 0.4
        min_text_size: 8
        total_epoch: 500
    - NormalizeImage:
        scale: 1./255.
        mean:
        - 0.485
        - 0.456
        - 0.406
        std:
        - 0.229
        - 0.224
        - 0.225
        order: hwc
    - ToCHWImage: null
    - KeepKeys:
        keep_keys:
        - image
        - threshold_map
        - threshold_mask
        - shrink_map
        - shrink_mask
  loader:
    shuffle: true
    drop_last: false
    batch_size_per_card: 4
    num_workers: 12
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: /home/sj/Project/PaddleOCR-main/Datasets/det/val/
    label_file_list:
      - /home/sj/Project/PaddleOCR-main/Datasets/det/val.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - DetLabelEncode: null
    - DetResizeForTest: 
        limit_side_len: 960
        limit_type: max
    - NormalizeImage:
        scale: 1./255.
        mean:
        - 0.485
        - 0.456
        - 0.406
        std:
        - 0.229
        - 0.224
        - 0.225
        order: hwc
    - ToCHWImage: null
    - KeepKeys:
        keep_keys:
        - image
        - shape
        - polys
        - ignore_tags
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 1
    num_workers: 12
profiler_options: null

2.文字识别模型rec（有问题的）

Global:
  debug: false
  use_gpu: true
  epoch_num: 300
  log_smooth_window: 20
  print_batch_step: 100
  save_model_dir: ./output/rec_ppocr_v4
  save_epoch_step: 50
  eval_batch_step:
  - 0
  - 2000
  cal_metric_during_train: false
#  pretrained_model: pretrain_model/en_PP-OCRv4_rec_train/best_accuracy.pdparams
  pretrained_model: null
  checkpoints: null
  save_inference_dir: null
  use_visualdl: false
  infer_img: doc/imgs_words/ch/word_1.jpg
  character_dict_path: ppocr/utils/en_dict.txt
  max_text_length: 25
  infer_mode: false
  use_space_char: true
  distributed: true
  save_res_path: ./output/rec/predicts_ppocrv3.txt
Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.0005
    warmup_epoch: 5
  regularizer:
    name: L2
    factor: 3.0e-05
Architecture:
  model_type: rec
  algorithm: SVTR_LCNet
  Transform: null
  Backbone:
    name: PPLCNetV3
    scale: 0.95
  Head:
    name: MultiHead
    head_list:
    - CTCHead:
        Neck:
          name: svtr
          dims: 120
          depth: 2
          hidden_dims: 120
          kernel_size:
          - 1
          - 3
          use_guide: true
        Head:
          fc_decay: 1.0e-05
    - NRTRHead:
        nrtr_dim: 384
        max_text_length: 25
Loss:
  name: MultiLoss
  loss_config_list:
  - CTCLoss: null
  - NRTRLoss: null
PostProcess:
  name: CTCLabelDecode
Metric:
  name: RecMetric
  main_indicator: acc
  ignore_space: false
Train:
  dataset:
    name: MultiScaleDataSet
    ds_width: false
    data_dir: Datasets/rec/train/
    ext_op_transform_idx: 1
    label_file_list:
    - Datasets/rec/train.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - RecConAug:
        prob: 0.5
        ext_data_num: 2
        image_shape:
        - 48
        - 320
        - 3
        max_text_length: 25
    - RecAug: null
    - MultiLabelEncode:
        gtc_encode: NRTRLabelEncode
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_gtc
        - length
        - valid_ratio
  sampler:
    name: MultiScaleSampler
    scales:
    - - 320
      - 32
    - - 320
      - 48
    - - 320
      - 64
    first_bs: 96
    fix_bs: false
    divided_factor:
    - 8
    - 16
    is_training: true
  loader:
    shuffle: true
    batch_size_per_card: 64
    drop_last: true
    num_workers: 8
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: Datasets/rec/val/
    label_file_list:
    - Datasets/rec/val.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - MultiLabelEncode:
        gtc_encode: NRTRLabelEncode
    - RecResizeImg:
        image_shape:
        - 3
        - 48
        - 320
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_gtc
        - length
        - valid_ratio
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 64
    num_workers: 8
profiler_options: null

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除
收藏举报

31条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
阿里嘎多学长 2024-12-04 14:22
关注
获得0.30元问题酬金
阿里嘎多学长整理AIGC生成，因移动端显示问题导致当前答案未能完全显示，请使用PC端查看更加详细的解答过程

PaddleOCR: Out of memory error on GPU

usted遇到了PaddleOCR训练过程中的Out of memory error on GPU问题，这可能是由于GPU内存不足或其他原因导致的。

检查GPU内存：可以使用 nvidia-smi 命令或 GPURem 工具来查看当前GPU的内存使用情况。
减少模型-size：如果GPU内存不足，可以尝试减少模型的大小，例如使用 Fewer layers，Fewer channels等方法。
增加 Swap 空间：增加 Swap 空间可以临时解决内存不足的问题，但可能会影响性能。
调整 hyperparameter：可以尝试调整一些 hyperparameter，例如 batch_size， learning_rate 等，以避免内存溢出。
使用数据 augmentation：可以使用数据 augmentation（数据加Noise、flip、rotate等操作）来增加数据的数量和复杂性，避免内存溢出。

具体来说，可以尝试以下代码：

import paddle import paddleocr # ... # 减少 batch size batch_size = 2 # 选择小模型 model = paddleocr.PaddleOCR(reset=False, show_log=True, use_gpu=True, gpu_ids=[0]) # ... model.train()

如果以上方法仍不能解决问题，请提供更多的日志信息和配置信息，以便更好地 troubleshoot。
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

Out of memory error on GPU 0. Cannot allocate 28.125000MB memory on GPU 0, 11.598755GB memory has be
2024-12-25 09:06

bug菌¹的博客 File "/home/sj/Project/PaddleOCR-main/ppocr/modeling/backbones/rec_lcnetv3.py", line 223, in forward out += self.identity(x) MemoryError: -------------------------------------- C++ Traceback (most ...
ppocr Out of memory error on GPU 0. Cannot allocate 128.000000MB memory on GPU 0,
2024-07-10 15:55

无证驾驶梁嗖嗖的博客 ] (at /paddle/paddle/phi/backends/gpu/gpu_resources.cc:235) [operator < linear > error] (ppocr) keqikeji@keqikeji-System-Product-Name:~/PaddleOCR$ Out of memory error on GPU 0. Cannot allocate 128....
关于使用paddlepaddle-gpu训练模型时，发生错误：(Out of memory error on GPU 0. Cannot allocate 49.000000MB ......）
2024-02-22 23:03

HE� T的博客显存不足报错，尝试减少下batch_size、或者裁剪图片等降低下显存占用。或者你不要在本地跑，用高配的服务器去训练后导出模型参数再考虑本地做预测及部署。我尝试减少batch_size=8（原：64）后，成功运行！
Out of memory error on GPU 0. Cannot allocate xxxGB memory on GPU 0, available memory is only xxx
2022-03-16 00:57

Encounter84的博客本人刚刚入手cv，见解浅陋，如有不对请多多包含。根据从网上查找的资料，遇到这种情况一般有以下几种解决方法： 1.在程序运行的前面添加如下代码 os.environ[‘FLAGS_eager_delete_tensor_gb’] = “0.0” ...
关于paddlepaddle使用推理模式时CUDA error:out of memory错误的解决办法
2022-02-28 14:13

会发paper的学渣的博客运行时报错Out of memory error on GPU 0. Cannot allocate 32.959229MB memory on GPU 0, available memory is only 3.287499MB. 其实显卡时内存足够的。解决办法：在程序运行的前面添加如下代码 import os ...
Paddle 使用踩坑 + 记录
2022-10-10 21:44

Three K的博客这paddle 把提示吞了参考https://forums.developer.nvidia.com/t/could-not-load-library-cudnn-cnn-infer64-8-dll-error-code-193/218437/16 内存不够 Out of memory error on GPU 0. Cannot allocate 70.312500MB...
Windows下PaddleOCR GPU版环境搭建指南
2025-12-16 13:28

君子心理的博客详细讲解在Windows系统中配置支持GPU的PaddleOCR环境，涵盖CUDA 11.8与cuDNN 8.9.7安装、Anaconda虚拟环境创建、PaddlePaddle-gpu及PaddleOCR库的安装与验证，并介绍PyCharm中运行OCR测试代码的完整流程。
PaddleOCR服务化部署：高可用OCR服务构建
2025-08-30 00:48

冯海莎Eliot的博客在当今数字化时代，OCR（Optical ...PaddleOCR作为业界领先的开源OCR工具包，提供了完整的服务化部署方案，帮助企业构建高可用、高性能的OCR服务集群。通过本文，您将掌握： - ✅ PaddleOCR服务化部署的核心...
ai之pdf解析工具 PPStructure 还是PaddleOCR
2025-05-22 18:08

AI小胖的博客 ai之pdf解析工具 PPStructure 还是PaddleOCR
Linux和Windows系统下:安装Anaconda、Paddle、tensorflow、pytorch，GPU[cuda12.4、cudnn]、CPU安装教学,多版本cuda11.2 自由切换
2023-07-12 10:02

汀、人工智能的博客 Linux和Windows系统下安装深度学习框架所需支持:Anaconda、Paddlepaddle、Paddlenlp、pytorch，含GPU、CPU版本详细安装过程
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
系统已结题 12月12日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
修改了问题 12月4日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 12月4日

Paddleocr:out of memory error on GPU

31条回答 默认 最新

PaddleOCR: Out of memory error on GPU

问题事件

31条回答默认最新