王太歌 2025-07-09 18:53 采纳率: 50%
浏览 26

如何在VMare ESXI下通过CentOS7虚拟机安装英伟达显卡驱动

我通过在VMware ESXI7.0.2上安装虚拟机,版本为centos 7-3.10.0-1160.el7.x86_64。然后我需要为这个虚拟机安装英伟达显卡驱动,但是当我执行.run之后,nvidia-smi却显示:> No devices were found

查看了网上大量资料,对许多配置进行了更改,包括但不限于的配置有:

  • 安装基础依赖环境等
  • 在ESXI管理页面中设置显卡为直通
  • nouveau也禁用掉了
  • 显卡驱动版本也更换了多个
  • 在虚拟机配置中的 虚拟机选项-高级-配置参数中新增参数:
  1. pciPassthru.64bitMMIOSizeGB = 64
  2. pciPassthru.use64bitMMIO = TRUE
  • 编译驱动内核版本也一致:

# uname -r
3.10.0-1160.el7.x86_64
# modinfo nvidia | grep vermagic
vermagic:       3.10.0-1160.el7.x86_64 SMP mod_unload modversions 

但是在最后检查的时候依然提示:No devices were found

各项状态检查如下
lsmod | grep nouveau
无输出
lspci | grep -i nvidia
0b:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
13:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
lsmod | grep nvidia
nvidia_uvm           1287347  0 
nvidia_drm             58061  0 
nvidia_modeset       1298897  1 nvidia_drm
nvidia              56742045  2 nvidia_modeset,nvidia_uvm
drm_kms_helper        186531  2 vmwgfx,nvidia_drm
drm                   456166  6 ttm,drm_kms_helper,nvidia,vmwgfx,nvidia_drm
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  535.247.01  Wed Mar 26 11:50:32 UTC 2025
GCC version:  gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)
ls -l /dev/nvidia*
crw-rw-rw-. 1 root root 195,   0 7月   9 17:36 /dev/nvidia0
crw-rw-rw-. 1 root root 195,   1 7月   9 17:36 /dev/nvidia1
crw-rw-rw-. 1 root root 195, 255 7月   9 17:36 /dev/nvidiactl
crw-rw-rw-. 1 root root 236,   0 7月   9 17:36 /dev/nvidia-uvm
crw-rw-rw-. 1 root root 236,   1 7月   9 17:36 /dev/nvidia-uvm-tools

/dev/nvidia-caps:
总用量 0
cr--------. 1 root root 239, 1 7月   9 17:36 nvidia-cap1
cr--r--r--. 1 root root 239, 2 7月   9 17:36 nvidia-cap2
dmesg | grep -i iommu
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-1160.el7.x86_64 root=/dev/mapper/centos_ai-root ro crashkernel=auto rd.lvm.lv=centos_ai/root rd.lvm.lv=centos_ai/swap rhgb quiet intel_iommu=on iommu=pt rd.driver.blacklist=nouveau
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.10.0-1160.el7.x86_64 root=/dev/mapper/centos_ai-root ro crashkernel=auto rd.lvm.lv=centos_ai/root rd.lvm.lv=centos_ai/swap rhgb quiet intel_iommu=on iommu=pt rd.driver.blacklist=nouveau
[    0.000000] DMAR: IOMMU enabled
dmesg | grep -i nvrm
[    4.441939] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b:00.0)
[    4.442554] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:13:00.0)
[    4.442602] NVRM: The NVIDIA probe routine failed for 2 device(s).
[    4.442605] NVRM: None of the NVIDIA devices were initialized.
[    4.570555] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b:00.0)
[    4.570583] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:13:00.0)
[    4.570607] NVRM: The NVIDIA probe routine failed for 2 device(s).
[    4.570608] NVRM: None of the NVIDIA devices were initialized.
[    5.881807] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b:00.0)
[    5.881833] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:13:00.0)
[    5.881854] NVRM: The NVIDIA probe routine failed for 2 device(s).
[    5.881855] NVRM: None of the NVIDIA devices were initialized.
[    6.108337] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b:00.0)
[    6.108399] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:13:00.0)
[    6.108444] NVRM: The NVIDIA probe routine failed for 2 device(s).
[    6.108447] NVRM: None of the NVIDIA devices were initialized.
[   11.955199] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b:00.0)
[   11.955228] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:13:00.0)
[   11.955251] NVRM: The NVIDIA probe routine failed for 2 device(s).
[   11.955252] NVRM: None of the NVIDIA devices were initialized.
[   12.458272] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b:00.0)
[   12.458303] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:13:00.0)
[   12.458327] NVRM: The NVIDIA probe routine failed for 2 device(s).
[   12.458329] NVRM: None of the NVIDIA devices were initialized.
[   13.464136] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b:00.0)
[   13.464165] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:13:00.0)
[   13.464187] NVRM: The NVIDIA probe routine failed for 2 device(s).
[   13.464189] NVRM: None of the NVIDIA devices were initialized.
[   14.067732] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b:00.0)
[   14.067758] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:13:00.0)
[   14.067781] NVRM: The NVIDIA probe routine failed for 2 device(s).
[   14.067782] NVRM: None of the NVIDIA devices were initialized.
[  501.632900] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b:00.0)
[  501.632952] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:13:00.0)
[  501.633008] NVRM: The NVIDIA probe routine failed for 2 device(s).
[  501.633010] NVRM: None of the NVIDIA devices were initialized.
[  606.357046] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b:00.0)
[  606.357105] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:13:00.0)
[  606.357171] NVRM: The NVIDIA probe routine failed for 2 device(s).
[  606.357174] NVRM: None of the NVIDIA devices were initialized.
[  691.541885] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b:00.0)
[  691.541940] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:13:00.0)
[  691.541994] NVRM: The NVIDIA probe routine failed for 2 device(s).
[  691.541997] NVRM: None of the NVIDIA devices were initialized.
[  781.468371] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b:00.0)
[  781.468377] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR2 is 0M @ 0x0 (PCI:0000:0b:00.0)
[  781.468395] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR5 is 0M @ 0x0 (PCI:0000:0b:00.0)
[  781.579751] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:13:00.0)
[  781.579760] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR2 is 0M @ 0x0 (PCI:0000:13:00.0)
[  781.579785] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR5 is 0M @ 0x0 (PCI:0000:13:00.0)
[  781.690427] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.247.01  Wed Mar 26 11:50:32 UTC 2025
[  816.478975] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b:00.0)
[  816.478985] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR2 is 0M @ 0x0 (PCI:0000:0b:00.0)
[  816.479011] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR5 is 0M @ 0x0 (PCI:0000:0b:00.0)
[  816.589390] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:13:00.0)
[  816.589399] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR2 is 0M @ 0x0 (PCI:0000:13:00.0)
[  816.589425] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR5 is 0M @ 0x0 (PCI:0000:13:00.0)
[  816.699807] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.247.01  Wed Mar 26 11:50:32 UTC 2025
[  830.034843] NVRM: GPU 0000:0b:00.0: RmInitAdapter failed! (0x24:0x72:1447)
[  830.034927] NVRM: GPU 0000:0b:00.0: rm_init_adapter failed, device minor number 0
[  830.345331] NVRM: GPU 0000:0b:00.0: RmInitAdapter failed! (0x24:0x72:1447)
[  830.345420] NVRM: GPU 0000:0b:00.0: rm_init_adapter failed, device minor number 0
[  830.938662] NVRM: GPU 0000:13:00.0: RmInitAdapter failed! (0x24:0x72:1447)
[  830.938735] NVRM: GPU 0000:13:00.0: rm_init_adapter failed, device minor number 1
[  831.254334] NVRM: GPU 0000:13:00.0: RmInitAdapter failed! (0x24:0x72:1447)
[  831.254413] NVRM: GPU 0000:13:00.0: rm_init_adapter failed, device minor number 1
[ 2885.867783] NVRM: GPU 0000:0b:00.0: RmInitAdapter failed! (0x24:0x72:1447)
[ 2885.867875] NVRM: GPU 0000:0b:00.0: rm_init_adapter failed, device minor number 0
[ 2886.179362] NVRM: GPU 0000:0b:00.0: RmInitAdapter failed! (0x24:0x72:1447)
[ 2886.179450] NVRM: GPU 0000:0b:00.0: rm_init_adapter failed, device minor number 0
[ 2886.496928] NVRM: GPU 0000:13:00.0: RmInitAdapter failed! (0x24:0x72:1447)
[ 2886.497021] NVRM: GPU 0000:13:00.0: rm_init_adapter failed, device minor number 1
[ 2886.813693] NVRM: GPU 0000:13:00.0: RmInitAdapter failed! (0x24:0x72:1447)
[ 2886.813773] NVRM: GPU 0000:13:00.0: rm_init_adapter failed, device minor number 1

似乎没有分配内存地址给GPU?找了一圈也没什么思路,问AI也是解决不了。接下来的排查方向和解决思路应该是什么呢

  • 写回答

6条回答 默认 最新

  • 阿里嘎多学长 2025-07-09 18:53
    关注

    阿里嘎多学长整理AIGC生成,因移动端显示问题导致当前答案未能完全显示,请使用PC端查看更加详细的解答过程

    解决方案

    在 VMware ESXI 中安装英伟达显卡驱动需要一些特殊处理。以下是解决方案:

    1. 确保虚拟机支持GPU acceleration:在 VMware ESXI 中,需要在虚拟机的配置中启用 GPU acceleration。可以在虚拟机的配置中,点击"Hardware",然后选择"Display",勾选"Accelerate 3D graphics"。
    2. 安装 NVIDIA显卡驱动:在虚拟机中,使用以下命令安装 NVIDIA 显卡驱动:
    sudo yum install kmod-nvidia-384
    

    其中,kmod-nvidia-384 是 NVIDIA 显卡驱动的包名,可以根据实际情况选择合适的版本。

    1. 手动安装 NVIDIA 显卡驱动:如果上述命令安装失败,可以尝试手动安装 NVIDIA 显卡驱动。下载 NVIDIA 显卡驱动的.run 文件,然后执行以下命令:
    sudo sh NVIDIA-Linux-x86_64-384.130.run
    

    其中,NVIDIA-Linux-x86_64-384.130.run 是 NVIDIA 显卡驱动的.run 文件名,可以根据实际情况选择合适的版本。

    1. 重启虚拟机:安装完成后,需要重启虚拟机,以使新的 NVIDIA 显卡驱动生效。

    问题解决

    通过以上步骤,您应该能够成功安装英伟达显卡驱动在 CentOS 7 虚拟机中。如果您仍然遇到问题,可以尝试查看虚拟机的日志文件,或者联系 VMware 支持团队获取更多帮助。

    评论

报告相同问题?

问题事件

  • 创建了问题 7月9日