我安装完docker后运行测试命令,结果报错,过程如下:
运行的测试命令:
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
报错:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.
ERRO[0000] error waiting for container: context canceled
我本来打算不管了直接导入镜像然后启动算了,结果又报错,过程如下:
启动前先输入以下命令
xhost local:root
报错:
non-network local connections being added to access control list
然后我想强行启动,输入以下命令
sudo docker run --runtime=nvidia --rm -it -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY -e XAUTHORITY -e NVIDIA_DRIVER_CAPABILITIES=all xtdrone:1.3
报错
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.
感觉问题应该是无法连接到gpu
执行以下命令
nvidia-smi
报错
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
我将内核降到5.4.0-80-generic,gcc和g++也降到7,然后重装驱动(docker镜像的驱动是470,我得下载跟他一致的驱动版本,不能下最新的)还是跟上面一样报错
有教程说要在软件与更新里选驱动,可是我的软件与更新里啥也没有
我的内核:Linux 5.15.0-56-generic
驱动版本:470.161.03