[YUNIKORN-2678] Yunikorn does not appear to be considering Guaranteed resources when allocating Pending Pods. - ASF JIRA

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.5.1
Fix Version/s: None
Component/s: core - scheduler
Labels:
None
Environment:
EKS 1.29

Description

Please see the attached queue configuration(jira-queues.yaml).

I will create 100 pods in Tier0, 100 pods in Tier1, 100 pods in Tier2 and 100 pods in Tier3. Each Pod will require 1 VCore. Initially, there will be 0 suitable nodes to run the Pods and all will be Pending. Karpenter will soon provision Nodes and Yunikorn will react by binding the Pods.

Given this code, I would expect Yunikorn to distribute the allocations such that each of the Tier’ed queues reaches its Guarantees. Instead, I observed a roughly even distribution of allocation across all of the queues.
Tier0 fails to meet its Gaurantees while Tier3, for instance, dramatically overshoots them.

> kubectl get pods -n finance | grep tier-0 | grep Pending | wc -l
   86
> kubectl get pods -n finance | grep tier-1 | grep Pending | wc -l
   83
> kubectl get pods -n finance | grep tier-2 | grep Pending | wc -l
   78
> kubectl get pods -n finance | grep tier-3 | grep Pending | wc -l
   77

Please see attached screen shots for queue usage.

Note, this situation can also be reproduced without the use of Karpenter by simply setting Yunikorn's `service.schedulingInterval` to a high duration, say 1m. Doing so will force Yunikorn to react to 400 Pods ~~across 4 queues~~ at roughly the same time forcing prioritization of queue allocations.

Test code to generate Pods:

from kubernetes import client, config
config.load_kube_config()


v1 = client.CoreV1Api()

def create_pod_manifest(tier, exec,):
    pod_manifest = {
        'apiVersion': 'v1',
        'kind': 'Pod',
        'metadata': {
            'name': f"rolling-test-tier-{tier}-exec-{exec}",
            'namespace': 'finance',
            'labels': {
                'applicationId': f"MyOwnApplicationId-tier-{tier}",
                'queue': f"root.tiers.{tier}"
            },
            "yunikorn.apache.org/user.info": '{"user":"system:serviceaccount:finance:spark","groups":["system:serviceaccounts","system:serviceaccounts:finance","system:authenticated"]}'
        },

        'spec': {
            "affinity": {
                "nodeAffinity" : {
                    "requiredDuringSchedulingIgnoredDuringExecution" : {
                        "nodeSelectorTerms" : [
                            {
                                "matchExpressions" : [
                                    {
                                        "key" : "di.rbx.com/dedicated",
                                        "operator" : "In",
                                        "values" : ["spark"]
                                    }
                                ]
                            }
                        ]

                    }
                },
            },
            "tolerations" : [
                {
                    "effect" : "NoSchedule",
                    "key": "dedicated",
                    "operator" : "Equal",
                    "value" : "spark"
                },
            ],

            "schedulerName": "yunikorn",
            'restartPolicy': 'Always',
            'containers': [{
                "name": "ubuntu",
                'image': 'ubuntu',
                "command": ["sleep", "604800"],
                "imagePullPolicy": "IfNotPresent",
                "resources" : {
                    "limits" : {
                        'cpu' : "1"
                    },
                    "requests" : {
                        'cpu' : "1"
                    }
                }
            }]
        }
    }
    return pod_manifest

for i in range(0,4):
    tier = str(i)
    for j in range(0,100):
        exec = str(j)
        pod_manifest = create_pod_manifest(tier, exec)
        print(pod_manifest)
        api_response = v1.create_namespaced_pod(body=pod_manifest, namespace="finance")
        print(f"creating tier( {tier} ) exec( {exec} )")

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

jira-tier3-screenshot.png
17/Jun/24 23:30
158 kB
Paul Santa Clara
jira-tier2-screenshot.png
17/Jun/24 23:30
153 kB
Paul Santa Clara
jira-tier1-screenshot.png
17/Jun/24 23:30
155 kB
Paul Santa Clara
jira-tier0-screenshot.png
17/Jun/24 23:30
151 kB
Paul Santa Clara
jira-queues.yaml
17/Jun/24 23:24
2 kB
Paul Santa Clara

Yunikorn does not appear to be considering Guaranteed resources when allocating Pending Pods.

Details

Description

Attachments

Attachments

Activity

People

Dates