Details
Description
RMContainer has a member variable queuename to store which queue the container belongs to. When RM HA happens and RMContainers are recovered by scheduler based on NM reports, the queue name isn't recovered and always be null.
This situation causes some problems. Here is a case in preemption. Preemption uses container's queue name to deduct preemptable resources when we use more than one preempt selector, (for example, enable intra-queue preemption,) . The detail is in
CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates()
If the contain's queue name is null, this function will throw a YarnRuntimeException because it tries to get the container's TempQueuePerPartition and the preemption fails.
Our patch solved this problem by setting container queue name when recover containers. The patch is based on branch-2.8.3.