Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-2678

Fair queue sorting is inconsistent

      Please see the attached queue configuration(jira-queues.yaml). 

      I will create 100 pods in Tier0, 100 pods in Tier1, 100 pods in Tier2 and 100 pods in Tier3.  Each Pod will require 1 VCore. Initially, there will be 0 suitable nodes to run the Pods and all will be Pending. Karpenter will soon provision Nodes and Yunikorn will react by binding the Pods.

      Given this code, I would expect Yunikorn to distribute the allocations such that each of the Tier’ed queues reaches its Guarantees.  Instead, I observed a roughly even distribution of allocation across all of the queues.
      Tier0 fails to meet its Gaurantees while Tier3, for instance, dramatically overshoots them.


      > kubectl get pods -n finance | grep tier-0 | grep Pending | wc -l
      > kubectl get pods -n finance | grep tier-1 | grep Pending | wc -l
      > kubectl get pods -n finance | grep tier-2 | grep Pending | wc -l
      > kubectl get pods -n finance | grep tier-3 | grep Pending | wc -l

      Please see attached screen shots for queue usage.

      Note, this situation can also be reproduced without the use of Karpenter by simply setting Yunikorn's `service.schedulingInterval` to a high duration, say 1m.  Doing so will force Yunikorn to react to 400 Pods across 4 queues at roughly the same time forcing prioritization of queue allocations.

      Test code to generate Pods:

      from kubernetes import client, config
      v1 = client.CoreV1Api()
      def create_pod_manifest(tier, exec,):
          pod_manifest = {
              'apiVersion': 'v1',
              'kind': 'Pod',
              'metadata': {
                  'name': f"rolling-test-tier-{tier}-exec-{exec}",
                  'namespace': 'finance',
                  'labels': {
                      'applicationId': f"MyOwnApplicationId-tier-{tier}",
                      'queue': f"root.tiers.{tier}"
                  "yunikorn.apache.org/user.info": '{"user":"system:serviceaccount:finance:spark","groups":["system:serviceaccounts","system:serviceaccounts:finance","system:authenticated"]}'
              'spec': {
                  "affinity": {
                      "nodeAffinity" : {
                          "requiredDuringSchedulingIgnoredDuringExecution" : {
                              "nodeSelectorTerms" : [
                                      "matchExpressions" : [
                                              "key" : "di.rbx.com/dedicated",
                                              "operator" : "In",
                                              "values" : ["spark"]
                  "tolerations" : [
                          "effect" : "NoSchedule",
                          "key": "dedicated",
                          "operator" : "Equal",
                          "value" : "spark"
                  "schedulerName": "yunikorn",
                  'restartPolicy': 'Always',
                  'containers': [{
                      "name": "ubuntu",
                      'image': 'ubuntu',
                      "command": ["sleep", "604800"],
                      "imagePullPolicy": "IfNotPresent",
                      "resources" : {
                          "limits" : {
                              'cpu' : "1"
                          "requests" : {
                              'cpu' : "1"
          return pod_manifest
      for i in range(0,4):
          tier = str(i)
          for j in range(0,100):
              exec = str(j)
              pod_manifest = create_pod_manifest(tier, exec)
              api_response = v1.create_namespaced_pod(body=pod_manifest, namespace="finance")
              print(f"creating tier( {tier} ) exec( {exec} )")







                Issue deployment