Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8232

RMContainer lost queue name when RM HA happens

    Details

    • Hadoop Flags:
      Reviewed
    • Flags:
      Patch

      Description

      RMContainer has a member variable queuename to store which queue the container belongs to. When RM HA happens and RMContainers are recovered by scheduler based on NM reports, the queue name isn't recovered and always be null.

      This situation causes some problems. Here is a case in preemption. Preemption uses container's queue name to deduct preemptable resources when we use more than one preempt selector, (for example, enable intra-queue preemption,) . The detail is in

      CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates()

      If the contain's queue name is null, this function will throw a YarnRuntimeException because it tries to get the container's TempQueuePerPartition and the preemption fails.

      Our patch solved this problem by setting container queue name when recover containers. The patch is based on branch-2.8.3.

       

       

        Attachments

        1. YARN-8232.001.patch
          2 kB
          Hu Ziqian
        2. YARN-8232.002.patch
          7 kB
          Hu Ziqian
        3. YARN-8232.003.patch
          7 kB
          Hu Ziqian
        4. YARN-8232-branch-2.8.3.001.patch
          2 kB
          Hu Ziqian

          Activity

            People

            • Assignee:
              ziqian hu Hu Ziqian
              Reporter:
              ziqian hu Hu Ziqian
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: