[SPARK-31856] Handle locality wait reset better when executors added - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.1.0
Fix Version/s: None
Component/s: Scheduler
Labels:
None

Description

We found a corner case with the new locality algorithm where we were not affectively resetting the locality when you started without executors. See https://github.com/apache/spark/pull/28656/

In that fix we reset the locality when a new executor is added and it changes the valid locality levels. This could reset it when you don't want to. for instance if the task set has been around for quite a while and you have other executors it could put tasks on. Resetting it back to the most local locality level could cause delays in the tasks getting scheduled that have been waiting for more then the locality wait.

This goes back to the new algorithm tries to use heuristic on this but its not perfect. Ideally I think we would still track per executor but that is a lot more change.

Investigate to see if we can handle this better.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Thomas Graves

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 28/May/20 14:26

Updated:: 28/May/20 14:26