Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9099

GpuResourceAllocator#getReleasingGpus calculates number of GPUs in a wrong way

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0, 3.2.1, 3.1.3
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      getReleasingGpus plays an important role in the calculation which happens when GpuAllocator assign GPUs to a container, see: GpuResourceAllocator#internalAssignGpus.

      If multiple GPUs are assigned to the same container, getReleasingGpus will return an invalid number.
      The iterator goes over on mappings of (GPU device, container ID) and it retrieves the container by its ID the number of times the container ID is mapped to any device.
      Then for every container, the resource value for the GPU resource is added to a running sum.
      Obviously, if a container is mapped to 2 or more devices, then the container's GPU resource counter is added to the running sum as many times as the number of GPU devices the container has.

      Example:
      Let's suppose usedDevices contains these mappings:

      • (GPU1, container1)
      • (GPU2, container1)
      • (GPU3, container2)

      GPU resource value is 2 for container1 and
      GPU resource value is 1 for container2.
      Then, if container1 is in a running state, getReleasingGpus will return 4 instead of 2.

        Attachments

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

            • Assignee:
              snemeth Szilard Nemeth Assign to me
              Reporter:
              snemeth Szilard Nemeth

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment