Details
-
New Feature
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
YuniKorn scheduler is able to allocate pods based on the node affinity policy "requiredDuringScheduling", but not "preferredDuringScheduling" yet.
YuniKorn currently does a full node sorting every time. After https://github.com/apache/incubator-yunikorn-core/pull/307, we will do sorting incrementally with an ordered B-tree. It can make scheduling different pods to preferred nodes a bit complicated.
Discussed with cheersyang yuchaoran2011 for a potential solution:
Maintain a preferred node list for certain labelled pods (configurable, e.g. spark driver pod in this case).
- Parse node label’s from the shim and send that to the core over SI (via node attributes).
- Parse the pod node-affinity preference in the shim and send that to the core.
- Implement something in GetSchedulableNodeIterator().. today we directly retrieve nodes from btree with asce order; we still loop the nodes once, but keep 2 list, one for preferred nodes; when we iterate nodes in the scheduling cycle, we iterate the preferred list first.
Adding wilfreds kmarton ccondit-target for more discussion too.
Slack discussion: https://yunikornworkspace.slack.com/archives/CL9CRJ1KM/p1630088069009500