Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-5555

Always provide access to NVIDIA control devices within containers (if GPU isolation is enabled).

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.0.0
    • None
    • Mesosphere Sprint 36, Mesosphere Sprint 37
    • 3

    Description

      Currently, access to `/dev/nvidiactl` and `/dev/nvidia-uvm` is only granted to / revoked from a container as GPUs are added and removed from them. On some level, this makes sense because most jobs don't need access to these devices unless they are also using a GPU. However, there are cases when access to these files is appropriate, even when not making use of a GPU. Running `nvidia-smi` to control the global state of the underlying nvidia driver, for example.

      We should add `/dev/nvidiactl` and `/dev/nvidia-uvm` to the default whitelist of devices to include in every container when the `gpu/nvidia` isolator is enabled. This will allow a container to run standard nvidia driver tools (such as `nvidia-smi`) without failing with abnormal errors when no GPUs have been granted to it. As such, these tools will now report that no GPUs are installed instead of failing abnormally.

      Attachments

        Activity

          People

            klueska Kevin Klues
            klueska Kevin Klues
            Benjamin Mahler Benjamin Mahler
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: