qwetid 2022-02-11 22:36 采纳率: 0%
浏览 1518

pytorch训练yolox出现这个错误,怎么解决?

运行结果及报错内容

C:/w/b/windows/pytorch/aten/src/ATen/native/cuda/Loss.cu:111: block: [61,0,0], thread: [28,0,0] Assertion input_val >= zero && input_val <= one failed.
C:/w/b/windows/pytorch/aten/src/ATen/native/cuda/Loss.cu:111: block: [61,0,0], thread: [29,0,0] Assertion input_val >= zero && input_val <= one failed.
C:/w/b/windows/pytorch/aten/src/ATen/native/cuda/Loss.cu:111: block: [61,0,0], thread: [30,0,0] Assertion input_val >= zero && input_val <= one failed.
C:/w/b/windows/pytorch/aten/src/ATen/native/cuda/Loss.cu:111: block: [61,0,0], thread: [31,0,0] Assertion input_val >= zero && input_val <= one failed.
0%| | 0/300 [07:18<?, ?it/s]
Traceback (most recent call last):
File "D:/doc/yolox-pytorch-from-scratch/train.py", line 80, in
train(model=model,
File "D:\doc\yolox-pytorch-from-scratch\utils\utils_fit.py", line 94, in train
loss, val_loss, lr = fit_one_epoch(model, yolo_loss, optimizer, epoch, gen, gen_val, writer)
File "D:\doc\yolox-pytorch-from-scratch\utils\utils_fit.py", line 24, in fit_one_epoch
loss_value = yolo_loss(outputs, targets)
File "C:\Users\wzr\miniconda3\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\doc\yolox-pytorch-from-scratch\nets\yolo_training.py", line 270, in forward
return self.get_losses(x_shifts, y_shifts, expanded_strides, labels, outputs)
File "D:\doc\yolox-pytorch-from-scratch\nets\yolo_training.py", line 344, in get_losses
= self.get_assignments(
File "C:\Users\wzr\miniconda3\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "D:\doc\yolox-pytorch-from-scratch\nets\yolo_training.py", line 444, in get_assignments
num_fg, gt_matched_classes, pred_ious_this_matching, matched_gt_inds = dynamic_k_matching(
File "D:\doc\yolox-pytorch-from-scratch\nets\yolo_training.py", line 206, in dynamic_k_matching
_, pos_idx = torch.topk(cost[gt_idx], k=dynamic_ks[gt_idx].item(), largest=False)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

看到这一句"input_val >= zero && input_val <= one",我又看了pytorch源码,是求bce损失出的问题,可我用的是nn.BCEWithLogitsLoss,应该不需要0到1啊,标签是没问题的,同样的数据在其他模型跑过,而且运行的时候一开始是正常的,后来才报错的

  • 写回答

3条回答 默认 最新

  • 中杯可乐多加冰 人工智能领域优质创作者 2022-02-12 11:43
    关注
    评论

报告相同问题?

问题事件

  • 创建了问题 2月11日

悬赏问题

  • ¥15 HC32串口DMA循环发送数据
  • ¥15 Uni-App实现飞书授权登陆
  • ¥50 Qt应用中如何通过代码打开开发者工具devtools
  • ¥20 mpp硬解码h264转为yuv
  • ¥40 怎样批量对比两个数据库的表差异
  • ¥60 具体分析这篇MVC结构springboot框架的安利代码
  • ¥15 lettuce连接哨兵redis,主从切换异常
  • ¥15 ubuntu出现以下问题
  • ¥15 R语言 survIDINRI包已将生存时间转为数值变量,仍错误 Time variable is not numeric。
  • ¥15 在Starccm中相变材料的物理模型该如何选择?