Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-10192

Recent Nvidia CUDA changes break Mesos GPU support

    XMLWordPrintableJSON

    Details

      Description

      Recently it seems that the layout of the Nvidia device files has changed: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/

      This prevents GPU tasks from launching:

      W0929 17:27:21.002178 65691 http.cpp:3436] Failed to launch container c08e1fc7-53c4-427e-a1a1-85b770e77d69.738440a3-f4cc-42ce-8978-418ba0011160: Failed to copy device '/dev/nvidia-caps': Failed to get source dev: Not a special file: /dev/nvidia-caps
      

      due to this code, which detects the nvidia device files: https://github.com/apache/mesos/blob/8700dd8d5ece658804d7b7a40863800dcc5c72bc/src/slave/containerizer/mesos/isolators/gpu/isolator.cpp#L438-L454

        Attachments

          Activity

            People

            • Assignee:
              qianzhang Qian Zhang
              Reporter:
              greggomann Greg Mann
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: