Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8232

RMContainer lost queue name when RM HA happens

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed
    • Patch

    Description

      RMContainer has a member variable queuename to store which queue the container belongs to. When RM HA happens and RMContainers are recovered by scheduler based on NM reports, the queue name isn't recovered and always be null.

      This situation causes some problems. Here is a case in preemption. Preemption uses container's queue name to deduct preemptable resources when we use more than one preempt selector, (for example, enable intra-queue preemption,) . The detail is in

      CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates()

      If the contain's queue name is null, this function will throw a YarnRuntimeException because it tries to get the container's TempQueuePerPartition and the preemption fails.

      Our patch solved this problem by setting container queue name when recover containers. The patch is based on branch-2.8.3.

       

       

      Attachments

        1. YARN-8232.001.patch
          2 kB
          Hu Ziqian
        2. YARN-8232-branch-2.8.3.001.patch
          2 kB
          Hu Ziqian
        3. YARN-8232.002.patch
          7 kB
          Hu Ziqian
        4. YARN-8232.003.patch
          7 kB
          Hu Ziqian

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ziqian hu Hu Ziqian
            ziqian hu Hu Ziqian
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment