问题: 在同一个配置环境中,我重新用一个文件来测试gpu可用,但是在训练中不可用?
用的3.9python,但是代码是扒的3.7python,1.5torch的。
部分代码
1.在新开的代码中测试
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
print(torch.cuda.device_count())
2.在原本training中
if args.local_rank == -1:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
args.n_gpu = torch.cuda.device_count()
else: # Initializes the distributed backend which will take care of sychronizing nodes/GPUs
torch.cuda.set_device(args.local_rank)
device = torch.device("cuda", args.local_rank)
torch.distributed.init_process_group(backend='nccl',
timeout=timedelta(minutes=60))
args.n_gpu = 1
args.device = device
args.nprocs = torch.cuda.device_count()
print(args.device)
print(args.n_gpu)
结果
1.在新开的代码中测试
- WARNING - __main__ - Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: True
cpu
0
2.在原本training中
cuda
1