Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8423

GPU does not get released even though the application gets killed.

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: yarn
    • Labels:
      None
    • Target Version/s:

      Description

      Run an Tensor flow app requesting one GPU.
      Kill the application once the GPU is allocated
      Query the nodemanger once the application is killed.We see that GPU is not being released.

       curl -i <NM>/ws/v1/node/resources/yarn.io%2Fgpu
      {"gpuDeviceInformation":{"gpus":[{"productName":"<productName>","uuid":"GPU-<UID>","minorNumber":0,"gpuUtilizations":{"overallGpuUtilization":0.0},"gpuMemoryUsage":{"usedMemoryMiB":73,"availMemoryMiB":12125,"totalMemoryMiB":12198},"temperature":{"currentGpuTemp":28.0,"maxGpuTemp":85.0,"slowThresholdGpuTemp":82.0}},{"productName":"<productName>","uuid":"GPU-<UID>","minorNumber":1,"gpuUtilizations":{"overallGpuUtilization":0.0},"gpuMemoryUsage":{"usedMemoryMiB":73,"availMemoryMiB":12125,"totalMemoryMiB":12198},"temperature":{"currentGpuTemp":28.0,"maxGpuTemp":85.0,"slowThresholdGpuTemp":82.0}}],"driverVersion":"<version>"},"totalGpuDevices":[{"index":0,"minorNumber":0},{"index":1,"minorNumber":1}],"assignedGpuDevices":[{"index":0,"minorNumber":0,"containerId":"container_<containerID>"}]}
      

        Attachments

        1. kill-container-nm.log
          4 kB
          Wangda Tan
        2. YARN-8423.001.patch
          5 kB
          Sunil Govindan
        3. YARN-8423.002.patch
          10 kB
          Sunil Govindan

          Activity

            People

            • Assignee:
              sunilg Sunil Govindan
              Reporter:
              ssathish@hortonworks.com Sumana Sathish
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated: