Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-2678

Yunikorn does not appear to be considering Guaranteed resources when allocating Pending Pods.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.5.1
    • None
    • core - scheduler
    • None
    • EKS 1.29

    Description

      Please see the attached queue configuration(jira-queues.yaml). 

      I will create 100 pods in Tier0, 100 pods in Tier1, 100 pods in Tier2 and 100 pods in Tier3.  Each Pod will require 1 VCore. Initially, there will be 0 suitable nodes to run the Pods and all will be Pending. Karpenter will soon provision Nodes and Yunikorn will react by binding the Pods.

      Given this code, I would expect Yunikorn to distribute the allocations such that each of the Tier’ed queues reaches its Guarantees.  Instead, I observed a roughly even distribution of allocation across all of the queues.
      Tier0 fails to meet its Gaurantees while Tier3, for instance, dramatically overshoots them.

       

      > kubectl get pods -n finance | grep tier-0 | grep Pending | wc -l
         86
      > kubectl get pods -n finance | grep tier-1 | grep Pending | wc -l
         83
      > kubectl get pods -n finance | grep tier-2 | grep Pending | wc -l
         78
      > kubectl get pods -n finance | grep tier-3 | grep Pending | wc -l
         77
      

      Please see attached screen shots for queue usage.

      Note, this situation can also be reproduced without the use of Karpenter by simply setting Yunikorn's `service.schedulingInterval` to a high duration, say 1m.  Doing so will force Yunikorn to react to 400 Pods across 4 queues at roughly the same time forcing prioritization of queue allocations.

      Test code to generate Pods:

      from kubernetes import client, config
      config.load_kube_config()
      
      
      v1 = client.CoreV1Api()
      
      def create_pod_manifest(tier, exec,):
          pod_manifest = {
              'apiVersion': 'v1',
              'kind': 'Pod',
              'metadata': {
                  'name': f"rolling-test-tier-{tier}-exec-{exec}",
                  'namespace': 'finance',
                  'labels': {
                      'applicationId': f"MyOwnApplicationId-tier-{tier}",
                      'queue': f"root.tiers.{tier}"
                  },
                  "yunikorn.apache.org/user.info": '{"user":"system:serviceaccount:finance:spark","groups":["system:serviceaccounts","system:serviceaccounts:finance","system:authenticated"]}'
              },
      
              'spec': {
                  "affinity": {
                      "nodeAffinity" : {
                          "requiredDuringSchedulingIgnoredDuringExecution" : {
                              "nodeSelectorTerms" : [
                                  {
                                      "matchExpressions" : [
                                          {
                                              "key" : "di.rbx.com/dedicated",
                                              "operator" : "In",
                                              "values" : ["spark"]
                                          }
                                      ]
                                  }
                              ]
      
                          }
                      },
                  },
                  "tolerations" : [
                      {
                          "effect" : "NoSchedule",
                          "key": "dedicated",
                          "operator" : "Equal",
                          "value" : "spark"
                      },
                  ],
      
                  "schedulerName": "yunikorn",
                  'restartPolicy': 'Always',
                  'containers': [{
                      "name": "ubuntu",
                      'image': 'ubuntu',
                      "command": ["sleep", "604800"],
                      "imagePullPolicy": "IfNotPresent",
                      "resources" : {
                          "limits" : {
                              'cpu' : "1"
                          },
                          "requests" : {
                              'cpu' : "1"
                          }
                      }
                  }]
              }
          }
          return pod_manifest
      
      for i in range(0,4):
          tier = str(i)
          for j in range(0,100):
              exec = str(j)
              pod_manifest = create_pod_manifest(tier, exec)
              print(pod_manifest)
              api_response = v1.create_namespaced_pod(body=pod_manifest, namespace="finance")
              print(f"creating tier( {tier} ) exec( {exec} )")
       

       

       

       

       

       

      Attachments

        1. jira-queues.yaml
          2 kB
          Paul Santa Clara
        2. jira-tier0-screenshot.png
          151 kB
          Paul Santa Clara
        3. jira-tier1-screenshot.png
          155 kB
          Paul Santa Clara
        4. jira-tier2-screenshot.png
          153 kB
          Paul Santa Clara
        5. jira-tier3-screenshot.png
          158 kB
          Paul Santa Clara

        Activity

          People

            Unassigned Unassigned
            psantaclara Paul Santa Clara
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: