FairShare preemption doesn't enforce fairness between sibling in some cases

      Queues hierarchy:

      root (cluster: 1000GB, 1000 vcores)

      • q1 (maxResources: 10GB, 10 vcores)
        • q1.1 (weight: 1)
        • q1.2 (weight: 9)
      • q2
      • q3



      1. app1 with a demand 100GB/100 vcores is added to q1.1 => it gets 10GB/10 vcores
        1. q1 reaches it's max
      2. app2 with a demand 1000GB/1000 vcores is added to q2 => it gets 990GB/990 vcores
        1. cluster runs at 100% capacity now
      3. app3 with demand 100GB/100 vcores is added to q1.2 => ...

      Expected behavior: fair share preemption preempts containers from app1 (q1.1) so app3 (q1.2) gets 9GB/9 vcores according to the weight.

      Observed behavior: app3 is starving

      Some observations:

      1. We see some preemption happening from app2 (q2) that matches app3 starvation (9GB/9 vcores in this case). It may suggest app2 preempts from app3 but can't use preempted containers due to this check. Also if a container for preemption is random, it way more likely to be preempted from app2 compared to app1 due to allocation size.
      2. Eliminating max on q1 helps to resolve the issue but we need to keep the max


      1. this is oversimplified version of our production set up. I can provide more details if needed.
      2. I have a heap dump of the issue that I can't share due because of our policy, but I can look up some state if needed.
      3. My co-worker reported a bug for the same issue YARN-11171, please feel free to close it as a duplicate.





