yolov5报错RuntimeError: indices should be either on cpu or on the same device as the indexed tensor

楼主最近在学习yolov5，处于刚起步阶段。我们要求使用yolov5的v1.0版本，在使用源代码的train.py的过程中楼主遇到了难以解决的问题，卡了十几个小时没有头绪。楼主用的是2.0.1的pytorch，cuda11.8
报错是这样的：

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

对于源码，楼主几乎没有进行任何的修改，只在yolo.py的大约第126行加了

with torch.no_grad()

实际代码块如下：

      def _initialize_biases(self, cf=None):  # initialize biases into Detect(), cf is class frequency
        # cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.
        m = self.model[-1]  # Detect() module
        for f, s in zip(m.f, m.stride):  #  from
            mi = self.model[f % m.i]
            # mi.to(device=torch.device('cuda:0'))
            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)
            with torch.no_grad():
                b[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)
                b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum())  # cls
            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)

然后运行train.py时报错：

Traceback (most recent call last):
  File "C:\Users\ASUS\Desktop\Study\DeepLearning\New Try\New try2\yolov5-1.0\train.py", line 409, in <module>
    train(hyp)
  File "C:\Users\ASUS\Desktop\Study\DeepLearning\New Try\New try2\yolov5-1.0\train.py", line 266, in train
    loss, loss_items = compute_loss(pred, targets.to(device), model)
  File "C:\Users\ASUS\Desktop\Study\DeepLearning\New Try\New try2\yolov5-1.0\utils\utils.py", line 423, in compute_loss
    tcls, tbox, indices, anchors = build_targets(p, targets, model)  # targets
  File "C:\Users\ASUS\Desktop\Study\DeepLearning\New Try\New try2\yolov5-1.0\utils\utils.py", line 508, in build_targets
    a, t = at[j], t.repeat(na, 1, 1)[j]  # filter
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

想请教各位高人怎么办，感谢

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
CSDN-Ada助手 CSDN-AI 官方账号 2023-09-20 02:06
关注
【相关推荐】

这篇博客: 通过yolov5训练自己的模型中遇到的一些问题及解决办法中的 问题六：RuntimeError: All input tensors must be on the same device. Received cpu and cuda:0 部分也许能够解决你的问题, 你可以仔细阅读以下内容或跳转源博客中阅读:

是耗费我最多时间才得以解决的问题，一定要好好记录！

报错提示：stats = [torch.cat(x, 0).cpu().detach().numpy() for x in zip(*stats)] # to numpy
我的误解：根据报错提示进行思考，是因为Numpy是CPU-only的（在CUDA下训练中的数据不能直接转化为numpy），所以在我们先把GPU tensor张量转换成Numpy数组的时候，需要把GPU tensor转换到CPU tensor去，才导致tensor一会在GPU上跑，一会在CPU上跑。于是我尝试了三种方案：
方案一：numpy数组转化为GPU tensor

stats = [torch.from_numpy(torch.cat(x, 0).cpu().detach().numpy()).cuda() for x in zip(*stats)]

运行之后发现还是报原来的错，仍然是在cpu和gpu两个设备上跑。

.to(device) 可以指定CPU或GPU；.cuda()只能指定GPU

方案二：尝试寻找一种方法，将GPU tensor转换为Numpy变量时，仍在GPU上跑，不用转换到CPU上去。

Cupy是一个通过利用CUDA GPU库在Nvidia GPU上实现Numpy数组的库

下载安装Cupy库，参考：
https://wenku.baidu.com/view/ff9563f175eeaeaad1f34693daef5ef7ba0d12db.html

方案三：当把GPU tensor转换为CPU tensor此步骤去掉后，我发现还是会报原来的错误。所以推断应该不是torch.cat(x, 0).cpu().numpy()的问题。
stats = [torch.cat(x, 0) for x in zip(*stats)]

我的猜测：stats问题？
根据报错提示，出错语句在stats = [torch.cat(x, 0).cpu().numpy() for x in zip(*stats)]，其中唯一的数据来源就是stats列表，stats列表结构如下：
①stats列表中包含很多个元组类型数据；

②每个元组中包含几个tensor张量；
用tensor.is_cuda判断其中每个tensor是否在GPU上

终于，发现问题所在，其中有几个tensor（tensor([])）是在CPU上，且通过判断这些tensor都不为空

解决办法：将在CPU上的tensor都转移到GPU上
写了一段代码，保证能够将CPU上的tensor转移到GPU上，代码如下：

# 把在CPU上的tensor转移到GPU上，使用range在for循环中修改list值 for i in range(len(stats)): stats[i] = list(stats[i]) # 修改元组中的元素：遵循”元组不可变，列表可变“，因此将元组转化为列表再进行修改 for j in range(len(stats[i])): if stats[i][j].is_cuda == False: stats[i][j] = stats[i][j].cuda() # print(stats[i][j].is_cuda) stats[i] = tuple(stats[i])

验证：成功将CPU上的tensor转移到GPU上

如果你已经解决了该问题, 非常希望你能够分享一下解决方案, 写成博客, 将相关链接放在评论区, 以帮助更多的人 ^-^
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

运行报错RuntimeError: Given groups=1, weight of size [512, 1024, 1, 1] python 深度学习目标检测
2022-12-20 18:01

回答 1 已采纳把你的yaml贴出来，看下你的CFP修改了什么，应该是通道数目设置不对。或者你自己按shift+ F9，打上断点debug一下，计算一下每一层网络输入输出设置，看下哪一层网络设置问题
训练bilstm模型，报错 RuntimeError: input.size(-1) must be equal to input_size. Expected 768, got 128，求解？ lstm nlp python
2022-01-13 03:53

回答 1 已采纳 self.lstm = nn.LSTM的input_size改成768
加载自定义数据集时出错RuntimeError: stack expects each tensor to be equal size pycharm pytorch 深度学习
2022-07-20 19:00

回答 2 已采纳瞅瞅行不行https://blog.csdn.net/balcklist/article/details/119033591
解决报错RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
2024-03-17 14:42

今天好好学代码了吗的博客在复现论文DPGN: Distribution Propagation Graph Network for Few-shot Learning的过程中遇到的问题。解决方案：代码看起来希望索引在cpu上，那么就在定义它的地方修改device到cpu上就好了。
RuntimeError: Numpy is not available 机器学习深度学习
2023-03-15 10:55

回答 3 已采纳该回答引用GPTᴼᴾᴱᴺᴬᴵ这个错误提示表明你的 PyTorch 模块是使用 API 版本 0x10 编译的，但是当前安装的 NumPy 版本的 API 版本是 0xf。这通常是由于 NumPy 版本
yolo训练时报错：RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB python 剪枝目标检测
2022-04-12 16:34

回答 1 已采纳降低batchsize的大小，例如8 16
输入和隐藏层不在同一设备上怎么处理！Input and hidden tensors are not at the same device pytorch 深度学习自然语言处理
2022-02-18 13:11

回答 2 已采纳 def init_hidden(self): return (torch.randn(2, self.batch, self.hidden_dim // 2)).to(self.device)
yolov7 RuntimeError:indices should be either on cpu or on the same device as the indexed tensor(cpu)
2022-12-28 20:35

VR小杰的博客在loss.py的第742行，将原先的device='cpu'替换为device='cuda:0'即可解决问题
pytorch报错CUDA error: invalid device function tensorflow 人工智能机器学习深度学习神经网络
2020-09-05 22:14

回答 1 已采纳检查显示卡是否兼容、驱动程序、cuda sdk和cudaa 的安装。
c++使用链表实现大数整数加减，OJ报错Runtime Error:Segmentation fault，请求寻找脏数据 c++ 有问必答链表
2022-03-30 20:48

回答 4 已采纳你的逻辑弄的太麻烦了，exchange函数没有必要的，判断一下a和b的大小，在传参的时候，改变一下传参顺序就可以了。修改后运行结果如下（运行截图中数字之间有空格，代码中已经把空格删掉了）：代码：
RuntimeError: CUDA error: invalid device ordinal 机器学习深度学习神经网络
2021-05-31 11:10

回答 2 已采纳在程序的前面加上，后面的数字要看你的显卡数目，意思是对该程序显示哪几张显卡可以使用。如果只有一张的话，要改成0.然后就是torch设置显卡的问题。最好这么写：torch.device('cuda:
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)解决办法
2024-04-20 22:52

-林哈哈的博客 YOLOv7 代码库：GitHub - bubbliiiing/yolov7-pytorch: 这是一个yolov7的库，可以用于训练自己的数据集。解决办法： 1.在yolo_training.py文件大约395行处加一行代码 fg_mask_inboxes = fg_mask_inboxes.to(torch....
c语言构建单向链表，vs正常运行但是gcc会报错runtime error:segmentation fault。 c语言
2019-11-08 22:51

回答 1 已采纳 https://blog.csdn.net/com_ma/article/details/78612248
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
2023-05-07 11:55

秃头的兔斯基的博客 (71条消息) yolov5过度到yolov7，细节详解_RobinTian.的博客-CSDN博客
解决yoloV7报错：indices should be either on cpu or on the same device as the indexed tensor (cpu)
2023-11-17 15:51

就爱学点yolo的博客修改为：from_which_layer.append((torch.ones(size=(len(b),)) * i).to(device))修改为：matching_matrix = torch.zeros_like(cost, device=device)修改：from_which_layer.append(torch.ones(size=(len(b),)) * i)...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
已结题（查看结题原因） 11月9日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 9月20日

悬赏问题

¥15 同一个网口一个电脑连接有网，另一个电脑连接没网
¥15 神经网络模型一直不能上GPU
¥15 pyqt怎么把滑块和输入框相互绑定，求解决！
¥20 wpf datagrid单元闪烁效果失灵
¥15 券商软件上市公司信息获取问题
¥100 ensp启动设备蓝屏，代码clock_watchdog_timeout
¥15 Android studio AVD启动不了
¥15 陆空双模式无人机怎么做
¥15 想咨询点问题，与算法转换，负荷预测，数字孪生有关
¥15 C#中的编译平台的区别影响

yolov5报错RuntimeError: indices should be either on cpu or on the same device as the indexed tensor

2条回答 默认 最新

问题事件

悬赏问题

2条回答默认最新