手可撩心房 2023-01-13 10:10 采纳率: 25%
浏览 99
已结题

torch问题-(stable diffusion2.0)

在debian11上执行python,关于torch的报错:

Python 3.10.8 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torch/cuda/__init__.py:83: UserWarning: HIP initialization: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice (Triggered internally at  ../c10/hip/HIPFunctions.cpp:110.)
  return torch._C._cuda_getDeviceCount() > 0
False
>>> 

运行stable diffusion 2.0的时候也是出现了这个错误。

(sd_GPU) root@debian:/home/LYF/stablediffusion/stablediffusion-main# python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt models/ldm/sd_v2/768model.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768  
/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
  warn(f"Failed to load image Python extension: {e}")
Global seed set to 42
Loading model from models/ldm/sd_v2/768model.ckpt
Global Step: 110000
No module 'xformers'. Proceeding without it.
LatentDiffusion: Running in v-prediction mode
DiffusionWrapper has 865.91 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Traceback (most recent call last):
  File "/home/LYF/stablediffusion/stablediffusion-main/scripts/txt2img.py", line 289, in <module>
    main(opt)
  File "/home/LYF/stablediffusion/stablediffusion-main/scripts/txt2img.py", line 190, in main
    model = load_model_from_config(config, f"{opt.ckpt}")
  File "/home/LYF/stablediffusion/stablediffusion-main/scripts/txt2img.py", line 43, in load_model_from_config
    model.cuda()
  File "/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 127, in cuda
    return super().cuda(device=device)
  File "/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torch/nn/modules/module.py", line 689, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torch/nn/modules/module.py", line 602, in _apply
    param_applied = fn(param)
  File "/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torch/nn/modules/module.py", line 689, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/root/miniconda3/envs/sd_GPU/lib/python3.10/site-packages/torch/cuda/__init__.py", line 217, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice

用的显卡是GTX 980 Ti 系统Debian11 ,cuda版本:cuda-11.4

(sd_GPU) root@debian:/home/LYF/stablediffusion/stablediffusion-main# nvidia-smi
Fri Jan 13 10:04:08 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| 22%   43C    P8    27W / 250W |      1MiB /  6075MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+


所以请问这是什么问题呢?是cuda版本问题吗?还是显卡驱动的问题?非常感谢(◍•ᴗ•◍)

  • 写回答

1条回答 默认 最新

  • 爱晚乏客游 2023-01-13 12:46
    关注

    输入 nvcc -V看下,你截图的这个的cuda根本不是你安装的cuda,而是你目前的驱动支持到cuda11.4。
    总共有三个东西,一个是显卡驱动,这个会显示你目前显卡的驱动可以支持到最高的cuda版本
    另外就是cuda和cudnn,这两个才是神经网络框架要的东西

    例如我下面的截图,nvidia-smi出来的是驱动,版本512.72支持到11.6的cuda,而我实际安装的cuda是11.4

    img


    所以你这个是cuda没有安装,或者cudnn没有安装。具体安装可以搜索下linux下面安装cuda

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论 编辑记录

报告相同问题?

问题事件

  • 系统已结题 1月25日
  • 已采纳回答 1月17日
  • 创建了问题 1月13日

悬赏问题

  • ¥15 selenium获取非固定位置的元素
  • ¥50 手写签名不能上传的问题
  • ¥30 linux odbc怎么添加gbase数据库
  • ¥20 电脑开机黑屏,只有一个鼠标,联想zj者y7000
  • ¥20 DXSDK_jun10
  • ¥20 请问这种量表怎么用spss量化分析(作为中介模型的因变量
  • ¥55 AD844 howland电流源如何驱动大额负载
  • ¥15 C++ /QT 内存权限的判断函数列举
  • ¥15 深度学习GFnet理解问题
  • ¥15 单细胞小提琴堆叠图代码