How to use NVIDIA GPU on podman (RHEL 9 / Fedora 37)

Jan 11, 2023

Podman is a container engine for developing, managing, and running containers on your Linux System. With the support for NVIDIA GPUs, you can easily run GPU-accelerated workloads in your containers, making it a great option for machine learning and other high-performance computing tasks.

In this guide, we will cover the necessary steps to set up your server, including installing the necessary drivers and software, configuring the system to recognize the GPU, and running your first container with GPU support.

Installing NVIDIA drivers

Assuming you are using RHEL 9 or Fedora 37.

1. Make sure you have third-party packages enabled:
https://docs.fedoraproject.org/en-US/workstation-working-group/third-party-repos/

2. Then, install the `akmod-nvidia` package:

sudo dnf install akmod-nvidia

3. Great. Now, make sure you restart your machine.

4. After restart, install the `xorg-x11-drv-nvidia-cuda` package:

sudo dnf install xorg-x11-drv-nvidia-cuda

5. Test if you NVIDIA GPU is working:

nvidia-smi -L

# You should get a result like this:
# GPU 0: NVIDIA GeForce RTX 3070 (UUID: GPU-...)

Install the nvidia-contaiern-toolkit

I wrote an Ansible playbook that will setup everything automatically. Just save the ./playbook-install-nvidia-container-toolkit-podman.yaml and executed it using the ansible-playbook CLI as showned down below:

# Install Ansible
sudo dnf install -y ansible

# Run the playbook (you need to provide the "sudo" password)
ansible-playbook playbook-install-nvidia-container-toolkit-podman.yaml --ask-become
#./playbook-install-nvidia-container-toolkit-podman.yaml

---

- name: Install nvidia-container-toolkit for podman
  hosts: localhost
  connection: local

  vars:
    # For Fedora 37
    distribution: rhel9.0

    # Image used to test nvidia-container-toolkit with podman
    test_container_image: docker.io/nvidia/cuda:11.6.2-base-ubuntu20.04

  tasks:
    # -- Preflight checks
    - name: Preflight checks (GPU found)
      block:
        - name: Check if GPU is available
          ansible.builtin.shell: nvidia-smi -L
          register: nvidia_smi_L
          changed_when: false
          failed_when: "'UUID: GPU-' not in nvidia_smi_L.stdout"
      rescue:
        - name: ERROR NVIDIA GPU not found
          ansible.builtin.fail:
            msg: "ERROR: NVIDIA GPU not found. Please check if the GPU is available."

    # -- Install
    - name: Install nvidia-container-toolkit and podman
      block:
        - name: Add nvidia-docker repo
          become: true
          ansible.builtin.get_url:
            url: https://nvidia.github.io/nvidia-docker/{{ distribution }}/nvidia-docker.repo
            dest: /etc/yum.repos.d/nvidia-container-toolkit.repo
            mode: '0644'

        - name: Install xorg-x11-drv-nvidia
          block:
            - name: Install xorg-x11-drv-nvidia
              become: true
              ansible.builtin.package:
                name: xorg-x11-drv-nvidia
                state: present
          rescue:
            - name: ERROR package couldn't be installed
              ansible.builtin.fail:
                msg: "ERROR: package xorg-x11-drv-nvidia couldn't be installed. Did you enable RPM Fusion? Check https://rpmfusion.org/Configuration"

        - name: Install nvidia-container-toolkit and podman
          become: true
          ansible.builtin.package:
            name:
              - nvidia-container-toolkit
              - podman
            state: present

        - name: Set no-cgroups to true
          become: true
          ansible.builtin.lineinfile:
            path: /etc/nvidia-container-runtime/config.toml
            regexp: '^#no-cgroups = false'
            line: 'no-cgroups = true'
            state: present

    # -- Test
    - name: Check if the GPU is visible from the container
      ansible.builtin.shell: >-
        podman run --rm --security-opt=label=disable \
          --hooks-dir=/usr/share/containers/oci/hooks.d/ \
          {{ test_container_image }} \
            nvidia-smi -L
      register: container_nvidia_smi_L
      changed_when: false
      failed_when: "'UUID: GPU-' not in container_nvidia_smi_L.stdout"

If you get no errors, then everything should be ready on your machine.

How to use

You will need to pass two flags in order to allow the podman container to access your host's GPU:

# You must provide the following lines:
# --security-opt=label=disable
# --hooks-dir=/usr/share/containers/oci/hooks.d/

podman run --rm -it \
    --security-opt=label=disable \
    --hooks-dir=/usr/share/containers/oci/hooks.d/ \
    docker.io/nvidia/cuda:11.6.2-base-ubuntu20.04

# (inside the container)
nvidia-smi

Reference

Luiz Felipe F M Costa

I am a quality engineer at Red Hat / Ansible. I love automation tools, games, and coffee. I am also an active contributor to open-source projects on GitHub.