[SPARK-41449] Stage level scheduling, allow to change number of executors - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.3.0, 3.3.1
Fix Version/s: None
Component/s: Scheduler
Labels:
- scheduler

Description

Since the total/max number of executor is constant throughout the application - in dynamic or static allocation - there is loose control over how much GPUs will be requested from the resource manager.

For example, if an application needs 500 executors for the ETL part (with N cores each), but it needs - or allowed - only 50 GPUs for the DL part, in practice it will request at least 500 GPUs from the RM, since `spark.executor.instances` is set to 500. This leads to resource management challenges in multi tenant environments.

A quick workaround is to repartition the RDD to 50 partitions just before switching resources, but it has obvious downsides.

It would be very helpful if the total/max number of executors could also be configured in the Resource Profile.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Shay Elbaz

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Dec/22 11:48

Updated:: 08/Dec/22 11:50