问题遇到的现象和发生背景
为对YOLOv5进行剪枝,对其进行稀疏训练,训练时报错:RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB。
问题相关代码,请勿粘贴截图
稀疏训练:python train_sparsity.py --st --sr 0.001 --weights ./runs/train/exp2/weights/last.pt --data ./data/dataset.yaml --epochs 150 --imgsz 512
运行结果及报错内容
Traceback (most recent call last):
File "train_sparsity.py", line 674, in <module>
main(opt)
File "train_sparsity.py", line 571, in main
train(opt.hyp, opt, device, callbacks)
File "train_sparsity.py", line 321, in train
pred = model(imgs) # forward
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\shuyf\yolov5\models\yolo.py", line 127, in forward
return self._forward_once(x, profile, visualize) # single-scale inference, train
File "C:\Users\shuyf\yolov5\models\yolo.py", line 150, in _forward_once
x = m(x) # run
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\shuyf\yolov5\models\common.py", line 178, in forward
return self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\shuyf\yolov5\models\common.py", line 46, in forward
return self.act(self.bn(self.conv(x)))
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\activation.py", line 395, in forward
return F.silu(input, inplace=self.inplace)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\functional.py", line 1897, in silu
return torch._C._nn.silu_(input)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 2.00 GiB total capacity; 1.13 GiB already allocated; 9.55 MiB free; 1.14 GiB reserved in total by PyTorch)
我的解答思路和尝试过的方法
尝试过用nvidia-smi查看GPU内存,结果发现No running processes found。
我想要达到的结果
顺利训练150次