[YUNIKORN-1085] DaemonSet pods may fail to be scheduled on new nodes added during autoscaling - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.12.2
Fix Version/s: 1.1.0
Component/s: shim - kubernetes
Labels:
None
Environment:
Amazon EKS, K8s 1.20, Cluster Autoscaler

Description

After ~~YUNIKORN-704~~ was done, YuniKorn should have the same mechanism as the default scheduler when it comes to scheduling DaemonSet pods. That's the case most times in our deployments. But recently we have found that DaemonSet scheduling became problematic again: When K8s Cluster Autoscaler adds new nodes in response to pending pods in the cluster, EKS will automatically create a CNI DaemonSet (Amazon's container networking module), one pod on each newly created node. But YuniKorn could not schedule these pods successfully. There's no informative error messages. The default queue that these pods belong to have available resources too. Because they couldn't be scheduled, EKS refuses to mark the new nodes as ready, they then get stuck in NotReady state. This issue is not always reproducible, but it has happened a few times. The root cause needs to be further researched.

Note that when this bug happened, the mitigation that worked was to disable the YuniKorn admission controller, delete all the pending DaemonSet pods, wait for the default scheduler will schedule them all, then the new nodes will become Ready. So it seems that there are edge cases that haven't been covered by the previous work where YuniKorn handles DaemonSet differently compared to the default scheduler

Attachments

sampleNode.txt
21/Feb/22 02:04
6 kB
Chaoran Yu
samplePod.yaml
21/Feb/22 02:04
8 kB
Chaoran Yu

Issue Links

Add Link

causes

YUNIKORN-1395 Account for preempted placeholder in the placeholder data

Closed

Delete this link

relates to

YUNIKORN-704 [Umbrella] Use the same mechanism to schedule daemon set pods as the default scheduler

Closed

Delete this link

YUNIKORN-1289 Publish Daemonset scheduling design doc

Closed

Delete this link

Sub-Tasks

Create Sub-Task

1.	pre-select node for daemon set pod	Closed	Manikandan R	Actions
2.	[Core] preempt pods based on the priorities	Closed	Manikandan R	Actions
3.	[Shim] preempt pods based on the priorities	Closed	Manikandan R	Actions
4.	[SI] preempt pods based on the priorities	Closed	Manikandan R	Actions
5.	Add e2e tests for the entire preemption flow	Closed	Manikandan R	Actions
6.	pass priority on pods from shim into the core	Closed	Manikandan R	Actions

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Manikandan R

Reporter:: Chaoran Yu

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 21/Feb/22 00:19

Updated:: 07/Feb/23 22:34

Resolved:: 08/Aug/22 11:07

Agile

View on Board

DaemonSet pods may fail to be scheduled on new nodes added during autoscaling

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates

Agile

Slack

Issue deployment