cublas runtime error
使用allennlp训练模型时出错
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/THCBlas.cu:331
OS: Linux
Pytorch: 1.2.0
CUDAToolkit: 10.0
allennlp: 0.9.0
NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7
GPU RTX3090
详细报错
2022-12-21 18:11:05,577 - INFO - allennlp.training.trainer - Training
0%| | 0/16148 [00:00<?, ?it/s]Traceback (most recent call last):
File "/data/yutian/anaconda3/envs/py37_2/bin/allennlp", line 8, in <module>
sys.exit(run())
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/run.py", line 18, in run
main(prog="allennlp")
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 102, in main
args.func(args)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/commands/train.py", line 124, in train_model_from_args
args.cache_prefix)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/commands/train.py", line 168, in train_model_from_file
cache_directory, cache_prefix)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/commands/train.py", line 252, in train_model
metrics = trainer.train()
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/training/trainer.py", line 478, in train
train_metrics = self._train_epoch(epoch)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/training/trainer.py", line 320, in _train_epoch
loss = self.batch_loss(batch_group, for_training=True)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/training/trainer.py", line 261, in batch_loss
output_dict = self.model(**batch)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "./model.py", line 187, in forward
joint_embedding = self.word_embedder(joint_tokens)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 118, in forward
token_vectors = embedder(*tensors, **forward_params_values)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/modules/token_embedders/bert_token_embedder.py", line 175, in forward
attention_mask=util.combine_initial_dims(input_mask))
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 733, in forward
output_all_encoded_layers=output_all_encoded_layers)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 406, in forward
hidden_states = layer_module(hidden_states, attention_mask)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 391, in forward
attention_output = self.attention(hidden_states, attention_mask)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 349, in forward
self_output = self.self(input_tensor, attention_mask)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 309, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/THCBlas.cu:331
0%| | 0/16148 [00:12<?, ?it/s]