Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8423

GPU does not get released even though the application gets killed.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 3.2.0, 3.1.1
    • yarn
    • None
    • Reviewed

    Description

      Run an Tensor flow app requesting one GPU.
      Kill the application once the GPU is allocated
      Query the nodemanger once the application is killed.We see that GPU is not being released.

       curl -i <NM>/ws/v1/node/resources/yarn.io%2Fgpu
      {"gpuDeviceInformation":{"gpus":[{"productName":"<productName>","uuid":"GPU-<UID>","minorNumber":0,"gpuUtilizations":{"overallGpuUtilization":0.0},"gpuMemoryUsage":{"usedMemoryMiB":73,"availMemoryMiB":12125,"totalMemoryMiB":12198},"temperature":{"currentGpuTemp":28.0,"maxGpuTemp":85.0,"slowThresholdGpuTemp":82.0}},{"productName":"<productName>","uuid":"GPU-<UID>","minorNumber":1,"gpuUtilizations":{"overallGpuUtilization":0.0},"gpuMemoryUsage":{"usedMemoryMiB":73,"availMemoryMiB":12125,"totalMemoryMiB":12198},"temperature":{"currentGpuTemp":28.0,"maxGpuTemp":85.0,"slowThresholdGpuTemp":82.0}}],"driverVersion":"<version>"},"totalGpuDevices":[{"index":0,"minorNumber":0},{"index":1,"minorNumber":1}],"assignedGpuDevices":[{"index":0,"minorNumber":0,"containerId":"container_<containerID>"}]}
      

      Attachments

        1. YARN-8423.003.patch
          10 kB
          Sunil G
        2. YARN-8423.002.patch
          10 kB
          Sunil G
        3. YARN-8423.001.patch
          5 kB
          Sunil G
        4. kill-container-nm.log
          4 kB
          Wangda Tan

        Issue Links

          Activity

            People

              sunilg Sunil G
              ssathish@hortonworks.com Sumana Sathish
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: