[SPARK-36057] SPIP: Support Customized Kubernetes Schedulers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Done
Affects Version/s: 3.3.0
Fix Version/s: 3.3.1
Component/s: Kubernetes
Labels:
- SPIP

Description

This is an umbrella issue for tracking the work for supporting Volcano & Yunikorn on Kubernetes. These schedulers provide more YARN like features (such as queues and minimum resources before scheduling jobs) that many folks want on Kubernetes.

Yunikorn is an ASF project & Volcano is a CNCF project (sig-batch).

They've taken slightly different approaches to solving the same problem, but from Spark's point of view we should be able to share much of the code.

See the initial brainstorming discussion in SPARK-35623.

DISCUSSION: https://lists.apache.org/thread/zv3o62xrob4dvgkbftbv5w5wy75hkbxg

VOTE: https://lists.apache.org/thread/cz3cpp8q4pgmh7h35h6lvkwf6g3lwhcd

VOTE Result: https://lists.apache.org/thread/nvwfo0yo0q8997vs86o7wkjyby4tbp0m

Design DOC: https://docs.google.com/document/d/1xgQGRpaHQX6-QH_J9YV2C2Dh6RpXefUpLM7KGkzL6Fg

Recap slide: https://lists.apache.org/thread/mwswfwkycj71npwz8gmv1r5nrvpwj77s

Attachments

Issue Links

is related to

SPARK-38396 Improve K8s Integration Tests

Resolved

SPARK-42802 Customized K8s Scheduler GA

Resolved

links to

[Github] Pull Request #35015 (Yikun)

Sub-Tasks

1.	Support replicasets/job API	Resolved	Holden Karau
2.	Add the ability to specify a scheduler	Resolved	Yikun Jiang
3.	Support for specifiying executor/driver node selector	Resolved	Yikun Jiang
4.	Add the ability to create resources before driver pod	Resolved	Yikun Jiang
5.	Add appId interface to KubernetesConf	Resolved	Yikun Jiang
6.	Add KubernetesCustom[Driver/Executor]FeatureConfigStep developer API	Resolved	Yikun Jiang
7.	Upgrade kubernetes-client to 5.12.0	Resolved	Yikun Jiang
8.	Upgrade kubernetes-client to 5.12.2	Resolved	Yikun Jiang
9.	Add `volcano` module and feature step	Resolved	Yikun Jiang
10.	Support queue scheduling (Introduce queue) with volcano implementations	Resolved	Yikun Jiang
11.	Add volcano section to K8s IT README.md	Resolved	Yikun Jiang
12.	Support priority scheduling with volcano implementations	Resolved	Yikun Jiang
13.	Bump minimum Volcano version to v1.5.1	Resolved	Yikun Jiang
14.	Fix Volcano weight to be positive integer and use cpu capability instead	Resolved	Yikun Jiang
15.	Support APP_ID and EXECUTOR_ID placeholder in annotations	Resolved	Dongjoon Hyun
16.	Support driver/executor PodGroup templates	Resolved	Dongjoon Hyun
17.	Support resource reservation (Introduce minCPU/minMemory) with volcano implementations	Resolved	Dongjoon Hyun
18.	Remove spark.kubernetes.job.queue in favor of spark.kubernetes.driver.podGroupTemplateFile	Resolved	Dongjoon Hyun
19.	Set the minimum Volcano version	Resolved	Dongjoon Hyun
20.	Remove priorityClassName propagation in favor of explicit settings	Resolved	Dongjoon Hyun
21.	Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix	Resolved	Dongjoon Hyun
22.	Unify Statefulset* to StatefulSet*	Resolved	Dongjoon Hyun
23.	Add YuniKornSuite	Resolved	Dongjoon Hyun
24.	Use YuniKorn v1.1+	Resolved	Dongjoon Hyun
25.	Add explicit YuniKorn queue submission test coverage	Resolved	Dongjoon Hyun
26.	Add doc for using Apache YuniKorn as a customized scheduler	Resolved	Weiwei Yang
27.	Volcano feature doesn't work on EKS graviton instances	Resolved	Yikun Jiang
28.	Volcano queue is not deleted	Resolved	Yikun Jiang
29.	Introduce `spark.kubernetes.job` sheduling related configurations	Closed	Unassigned
30.	Support backing off dynamic allocation increases if resources are "stuck"	Closed	Unassigned
31.	[Deprecated] Support the Volcano Job API	Closed	Unassigned
32.	Check resource after resource creation	Closed	Unassigned
33.	[CI] Introduce Spark on Kubernetes CI into Volcano community	Closed	Unassigned
34.	Add doc for "Customized Kubernetes Schedulers"	Resolved	Yikun Jiang
35.	Add doc for Volcano scheduler	Resolved	Yikun Jiang
36.	Add yunikorn feature step	Resolved	Unassigned
37.	Support job queue in YuniKorn feature step	Resolved	Unassigned
38.	Fix doc format/syntax error	Resolved	Yikun Jiang

Activity

People

Assignee:: Yikun Jiang

Reporter:: Holden Karau

Shepherd:: Holden Karau

Votes:: 4 Vote for this issue

Watchers:: 35 Start watching this issue

Dates

Created:: 08/Jul/21 17:47

Updated:: 15/Mar/23 06:46

Resolved:: 24/Oct/22 19:25