问题遇到的现象和发生背景
实验室服务器是双GPU,都是NVIDIA 1080Ti,在pytorch框架下运行程序。出现了下面的错误:
单独调用GPU0跑程序,没有问题。
单独调用GPU1跑程序,也没有问题。
同时调用两个GPU(GPU0的程序先开始运行):调用GPU1的程序无法开始运行,直接报错。
同时调用两个GPU(GPU1的程序先开始运行):在GPU0的程序正常运行,且一旦GPU0的程序开始运行,GPU1的程序就会立刻停止,并报错。
问题相关代码,请勿粘贴截图
运行结果及报错内容
Traceback (most recent call last):
File "DSAN.py", line 184, in <module>
train(epoch, model)
File "DSAN.py", line 99, in train
label_source_pred, loss_mmd = model(data_source, data_target, label_source)
File "D:\ProgramFiles\Anaconda3-py3.7.1\envs\zcm\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\zcm\20210530resnest实验2\resnest.py", line 55, in forward
source = self.feature_layers(source)
File "D:\ProgramFiles\Anaconda3-py3.7.1\envs\zcm\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\zcm\20210530resnest实验2\resnet_new.py", line 306, in forward
x = self.layer2(x)
File "D:\ProgramFiles\Anaconda3-py3.7.1\envs\zcm\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\ProgramFiles\Anaconda3-py3.7.1\envs\zcm\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
input = module(input)
File "D:\ProgramFiles\Anaconda3-py3.7.1\envs\zcm\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\zcm\20210530resnest实验2\resnet_new.py", line 118, in forward
out = self.conv2(out)
File "D:\ProgramFiles\Anaconda3-py3.7.1\envs\zcm\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\zcm\20210530resnest实验2\splat.py", line 65, in forward
gap = sum(splited)
RuntimeError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 11.00 GiB total capacity; 1.03 GiB already allocated; 7.46 GiB free; 1.12 GiB reserved in total by PyTorch)
我的解答思路和尝试过的方法
我想要达到的结果
想请问如何解决上述问题。