晶焰焰儿 2020-03-31 20:36 采纳率: 20%
浏览 6180

关于pytorch使用多张显卡的问题

问题描述:同一段代码,使用单显卡时没有问题,使用多张显卡时出现问题:

Traceback (most recent call last):
  File "trainer.py", line 370, in <module>
    trainer.train()
  File "trainer.py", line 263, in train
    self.x_tilde = self.G(self.z)
  File "G:\anaconda\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "G:\anaconda\lib\site-packages\torch\nn\parallel\data_parallel.py", line 152, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "G:\anaconda\lib\site-packages\torch\nn\parallel\data_parallel.py", line 162, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "G:\anaconda\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 85, in parallel_apply
    output.reraise()
  File "G:\anaconda\lib\site-packages\torch\_utils.py", line 394, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
  File "G:\anaconda\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "G:\anaconda\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "E:\Work_place\pggan-pytorch-master的副本\network.py", line 181, in forward
    x = self.model(x.view(x.size(0), -1, 1, 1))
  File "G:\anaconda\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "G:\anaconda\lib\site-packages\torch\nn\modules\container.py", line 100, in forward
    input = module(input)
  File "G:\anaconda\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "G:\anaconda\lib\site-packages\torch\nn\modules\container.py", line 100, in forward
    input = module(input)
  File "G:\anaconda\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "E:\Work_place\pggan-pytorch-master的副本\custom_layers.py", line 113, in forward
    x = self.conv(x.mul(self.scale))
  File "G:\anaconda\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "G:\anaconda\lib\site-packages\torch\nn\modules\conv.py", line 345, in forward
    return self.conv2d_forward(input, self.weight)
  File "G:\anaconda\lib\site-packages\torch\nn\modules\conv.py", line 342, in conv2d_forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM

双显卡的型号为:
0号显卡GTX1660,
1号显卡GTX1060
两张显卡都是6G版本。
不知道这是哪里出问题了,求各路大神指点。

  • 写回答

3条回答 默认 最新

  • Antony4theDay 2020-10-12 19:12
    关注

    同问
    Traceback (most recent call last):
    File "main.py", line 292, in
    main()
    File "main.py", line 91, in main
    train_op(net, args)
    File "main.py", line 157, in train_op
    loss = net.deterministic_forward(data)
    File "/mnt/lustre/dengandong/self-disentangle/model/network.py", line 63, in deterministic_forward
    self.z_c, self.gap, self.reconstructed_gap = self.dCE(self.true) # ae
    File "/mnt/lustre/dengandong/anaconda3/envs/video_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in call_impl
    result = self.forward(*input, **kwargs)
    File "/mnt/lustre/dengandong/anaconda3/envs/video_torch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
    File "/mnt/lustre/dengandong/anaconda3/envs/video_torch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
    File "/mnt/lustre/dengandong/anaconda3/envs/video_torch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
    output.reraise()
    File "/mnt/lustre/dengandong/anaconda3/envs/video_torch/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
    RuntimeError: Caught RuntimeError in replica 0 on device 0.
    Original Traceback (most recent call last):
    File "/mnt/lustre/dengandong/anaconda3/envs/video_torch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
    File "/mnt/lustre/dengandong/anaconda3/envs/video_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/mnt/lustre/dengandong/self-disentangle/model/autoencoder/ae_3dcnn.py", line 64, in forward
    content_code = self.encoder(reduce_frames)
    File "/mnt/lustre/dengandong/anaconda3/envs/video_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/mnt/lustre/dengandong/anaconda3/envs/video_torch/lib/python3.6/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
    File "/mnt/lustre/dengandong/anaconda3/envs/video_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/mnt/lustre/dengandong/self-disentangle/model/
    _init__.py", line 93, in forward
    x = conv(x)
    File "/mnt/lustre/dengandong/anaconda3/envs/video_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in call_impl
    result = self.forward(*input, **kwargs)
    File "/mnt/lustre/dengandong/self-disentangle/model/
    _init__.py", line 30, in forward
    x = self.conv(x)
    File "/mnt/lustre/dengandong/anaconda3/envs/video_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/mnt/lustre/dengandong/anaconda3/envs/video_torch/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 419, in forward
    return self._conv_forward(input, self.weight)
    File "/mnt/lustre/dengandong/anaconda3/envs/video_torch/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 416, in _conv_forward
    self.padding, self.dilation, self.groups)
    RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

    评论

报告相同问题?

悬赏问题

  • ¥15 R语言Rstudio突然无法启动
  • ¥15 关于#matlab#的问题:提取2个图像的变量作为另外一个图像像元的移动量,计算新的位置创建新的图像并提取第二个图像的变量到新的图像
  • ¥15 改算法,照着压缩包里边,参考其他代码封装的格式 写到main函数里
  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值