Simulation / Modeling / Design

NVIDIA Transitions Fully Towards Open-Source GPU Kernel Modules

Decorative image of light fields in green, purple, and blue.

With the R515 driver, NVIDIA released a set of Linux GPU kernel modules in May 2022 as open source with dual GPL and MIT licensing. The initial release targeted datacenter compute GPUs, with GeForce and Workstation GPUs in an alpha state. 

At the time, we announced that more robust and fully-featured GeForce and Workstation Linux support would follow in subsequent releases and the NVIDIA Open Kernel Modules would eventually supplant the closed-source driver. 

NVIDIA GPUs share a common driver architecture and capability set. The same driver for your desktop or laptop runs the world’s most advanced AI workloads in the cloud. It’s been incredibly important to us that we get it just right. 

Two years on, we’ve achieved equivalent or better application performance with our open-source GPU kernel modules and added substantial new capabilities:

  • Heterogeneous memory management (HMM) support
  • Confidential computing
  • The coherent memory architectures of our Grace platforms
  • And more

We’re now at a point where transitioning fully to the open-source GPU kernel modules is the right move, and we’re making that change in the upcoming R560 driver release.

Supported GPUs

Not every GPU is compatible with the open-source GPU kernel modules.

For cutting-edge platforms such as NVIDIA Grace Hopper or NVIDIA Blackwell, you must use the open-source GPU kernel modules. The proprietary drivers are unsupported on these platforms.

For newer GPUs from the Turing, Ampere, Ada Lovelace, or Hopper architectures, NVIDIA recommends switching to the open-source GPU kernel modules.

For older GPUs from the Maxwell, Pascal, or Volta architectures, the open-source GPU kernel modules are not compatible with your platform. Continue to use the NVIDIA proprietary driver.

For mixed deployments with older and newer GPUs in the same system, continue to use the proprietary driver.

If you are not sure, NVIDIA provides a new detection helper script to help guide you on which driver to pick. For more information, see the Using the installation helper script section later in this post.

Installer changes

In general, the default version of the driver installed by all installation methods is switching from the proprietary driver to the open-source driver. There are a few specific scenarios that deserve special attention:

  • Package managers with the CUDA metapackage
  • Runfile
  • Installation helper script
  • Package manager details
  • Windows Subsystem for Linux
  • CUDA Toolkit

Using package managers with the CUDA metapackage

When you are installing CUDA Toolkit using a package manager (not the .run file), installation metapackages exist and are commonly used. By installing a top-level cuda package, you install a combination of CUDA Toolkit and the associated driver release. For example, by installing cuda during the CUDA 12.5 release time frame, you get the proprietary NVIDIA driver 555 along with CUDA Toolkit 12.5. 

Figure 1 shows this package structure.

Diagram shows the flow of installing CUDA software that includes installing both the nvidia-driver-555 and cuda-toolkit-12.5 modules.
Figure 1. CUDA packages before CUDA Toolkit 12.6

Previously, using the open-source GPU kernel modules would mean that you could not use the top-level metapackage. You would have had to install the distro-specific NVIDIA driver open package along with the cuda-toolkit-X-Y package of your choice.

Beginning with the CUDA 12.6 release, the flow effectively switches places (Figure 2).

Diagram shows the revised flow of installing CUDA software, where the nvidia-driver-open-560 and cuda-toolkit-12.6 modules are installed instead.
Figure 2. CUDA packages after the CUDA Toolkit 12.6 release

Using the runfile

If you install CUDA or the NVIDIA drivers using the .run file, the installer queries your hardware and automatically installs the best-fit driver for your system. UI toggles are also available to select between the proprietary driver and the open source driver, as you choose.

If you’re installing through the CUDA .run file and using the ncurses user interface, you now see a menu similar to the following:

┌──────────────────────────────────────────────────────────────────────────────┐
│ CUDA Driver                                                                  │
│   [ ] Do not install any of the OpenGL-related driver files                  │
│   [ ] Do not install the nvidia-drm kernel module                            │
│   [ ] Update the system X config file to use the NVIDIA X driver             │
│ - [X] Override kernel module type                                            │
│      [X] proprietary                                                         │
│      [ ] open                                                                │
│   Change directory containing the kernel source files                        │
│   Change kernel object output directory                                      │
│   Done                                                                       │
│                                                                              │
│                                                                              │
│                                                                              │
│ Up/Down: Move | Left/Right: Expand | 'Enter': Select | 'A': Advanced options │
└──────────────────────────────────────────────────────────────────────────────┘

If you’re installing through the driver .run file, you see a similar choice presented (Figure 3).

Screenshot shows the user interface highlighting two buttons labeled NVIDIA Proprietary and MIT/GPL, respectively. It suggests the user choose the MIT/GPL button to install the correct kernel module type.
Figure 3. New runfile interactive selection (driver installer)

You can also pass overrides using the command line to install without the user interface or if you are using automation tools such as Ansible.

# sh ./cuda_12.6.0_560.22_linux.run --override --kernel-module-type=proprietary

# sh ./NVIDIA-Linux-x86_64-560.run --kernel-module-type=proprietary

Using the installation helper script

As mentioned earlier, if you’re unsure which driver to pick for the GPUs in your system, NVIDIA created a helper script to guide you through the selection process. 

To use it, first install the nvidia-driver-assistant package with your package manager, then run the script:

$ nvidia-driver-assistant

Package manager details

For a consistent experience, NVIDIA recommends that you use package managers to install CUDA Toolkit and the drivers. However, the specific details of which package management systems are used by different distributions or how packages are structured can vary depending on your particular distribution. 

This section outlines the specific details, caveats, or migration steps needed for various platforms. 

apt: Ubuntu and Debian-based distributions

Run the following command:

$ sudo apt-get install nvidia-open

To upgrade using the cuda metapackage on Ubuntu 20.04, first switch to open kernel modules:

$ sudo apt-get install -V nvidia-kernel-source-open

$ sudo apt-get install nvidia-open

dnf: Red Hat Enterprise Linux, Fedora, Kylin, Amazon Linux, or Rocky Linux

Run the following command:

$ sudo dnf module install nvidia-driver:open-dkms

To upgrade using the cuda metapackage on dnf-based distros, module streams must be disabled:

$ echo "module_hotfixes=1" | tee -a /etc/yum.repos.d/cuda*.repo
$ sudo dnf install --allowerasing nvidia-open
$ sudo dnf module reset nvidia-driver

zypper: SUSE Linux Enterprise Server, or OpenSUSE

Run one of the following commands:

# default kernel flavor
$ sudo zypper install nvidia-open
# azure kernel flavor (sles15/x86_64)
$ sudo zypper install nvidia-open-azure
# 64kb kernel flavor (sles15/sbsa) required for Grace-Hopper
$ sudo zypper install nvidia-open-64k

Package manager summary

For simplification, we’ve condensed the package manager recommendations in table format. All releases beyond driver version 560 and CUDA Toolkit 12.6 will use these packaging conventions.

DistroInstall the latest Install a specific release 
Fedora/RHEL/Kylindnf module install nvidia-driver:open-dkmsdnf module install nvidia-driver:560-open
openSUSE/SLESzypper install nvidia-open{-azure,-64k}zypper install nvidia-open-560{-azure,-64k}
Debianapt-get install nvidia-openapt-get install nvidia-open-560
Ubuntuapt-get install nvidia-openapt-get install nvidia-open-560
Table 1. Package manager installation recommendations

For more information, see NVIDIA Datacenter Drivers.

Windows Subsystem for Linux

Windows Subsystem for Linux (WSL) uses the NVIDIA kernel driver from the host Windows operating system. You shouldn’t install any driver into this platform specifically. If you are using WSL, no change or action is required.

CUDA Toolkit

The installation of CUDA Toolkit remains unchanged through package managers. Run the following command:

$ sudo apt-get/dnf/zypper install cuda-toolkit

More information

For more information about how to install NVIDIA drivers or the CUDA Toolkit, including how to ensure that you install the proprietary drivers if you’re unable to migrate to the open-source GPU kernel modules at this time, see Driver Installation in the CUDA Installation Guide.

Discuss (4)

Tags

  翻译: