yutianCHN 2022-12-21 18:22 采纳率: 0%
浏览 117
已结题

cublas runtime error

cublas runtime error
使用allennlp训练模型时出错
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/THCBlas.cu:331

OS: Linux
Pytorch: 1.2.0
CUDAToolkit: 10.0
allennlp: 0.9.0
NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7
GPU RTX3090

详细报错
2022-12-21 18:11:05,577 - INFO - allennlp.training.trainer - Training
  0%|          | 0/16148 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/data/yutian/anaconda3/envs/py37_2/bin/allennlp", line 8, in <module>
    sys.exit(run())
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/run.py", line 18, in run
    main(prog="allennlp")
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 102, in main
    args.func(args)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/commands/train.py", line 124, in train_model_from_args
    args.cache_prefix)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/commands/train.py", line 168, in train_model_from_file
    cache_directory, cache_prefix)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/commands/train.py", line 252, in train_model
    metrics = trainer.train()
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/training/trainer.py", line 478, in train
    train_metrics = self._train_epoch(epoch)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/training/trainer.py", line 320, in _train_epoch
    loss = self.batch_loss(batch_group, for_training=True)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/training/trainer.py", line 261, in batch_loss
    output_dict = self.model(**batch)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "./model.py", line 187, in forward
    joint_embedding = self.word_embedder(joint_tokens)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 118, in forward
    token_vectors = embedder(*tensors, **forward_params_values)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/modules/token_embedders/bert_token_embedder.py", line 175, in forward
    attention_mask=util.combine_initial_dims(input_mask))
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 733, in forward
    output_all_encoded_layers=output_all_encoded_layers)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 406, in forward
    hidden_states = layer_module(hidden_states, attention_mask)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 391, in forward
    attention_output = self.attention(hidden_states, attention_mask)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 349, in forward
    self_output = self.self(input_tensor, attention_mask)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 309, in forward
    attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/THCBlas.cu:331
  0%|          | 0/16148 [00:12<?, ?it/s]
  • 写回答

6条回答 默认 最新

  • 爱晚乏客游 2022-12-22 09:24
    关注
    获得7.50元问题酬金

    楼上一堆人都不看环境说明的吗
    你这个问题很简单,就是30系显卡不支持cuda11.0以前的版本,但是你的cuda是10.0的,所以就会这样。驱动你已经是支持11.7的cuda了,所以不用管,你需要重新安装cuda(这个版本由你要安装的pytorch确定)和cudnn,然后安装对应的pytroch,torchvision和torchaudio。
    如果你的低版本的pytroch无法兼容cuda11.0以上的版本,要么自己折腾下编译源码(坑多,不好搞,需要研究),要么升级下torch版本到有11.0以以上cuda的版本。我比较建议升级torch版本,因为pytorch版本之间的兼容性不错,基本上都不需要修改源码。

    评论 编辑记录

报告相同问题?

问题事件

  • 系统已结题 12月29日
  • 创建了问题 12月21日