Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Mesosphere Sprint 36, Mesosphere Sprint 37
-
3
Description
Currently, access to `/dev/nvidiactl` and `/dev/nvidia-uvm` is only granted to / revoked from a container as GPUs are added and removed from them. On some level, this makes sense because most jobs don't need access to these devices unless they are also using a GPU. However, there are cases when access to these files is appropriate, even when not making use of a GPU. Running `nvidia-smi` to control the global state of the underlying nvidia driver, for example.
We should add `/dev/nvidiactl` and `/dev/nvidia-uvm` to the default whitelist of devices to include in every container when the `gpu/nvidia` isolator is enabled. This will allow a container to run standard nvidia driver tools (such as `nvidia-smi`) without failing with abnormal errors when no GPUs have been granted to it. As such, these tools will now report that no GPUs are installed instead of failing abnormally.