[SPARK-4383] Delay scheduling doesn't work right when jobs have tasks with different locality levels - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.2, 1.1.0
Fix Version/s: 1.3.0
Component/s: Scheduler, Spark Core
Labels:
None

Description

Copied from mailing list discussion:

Now our application will load data from hdfs in the same spark cluster, it will get NODE_LOCAL and RACK_LOCAL level tasks during loading stage, if the tasks in loading stage have same locality level, ether NODE_LOCAL or RACK_LOCAL it works fine.
But if the tasks in loading stage get mixed locality level, such as 3 NODE_LOCAL tasks, and 2 RACK_LOCAL tasks, then the TaskSetManager of loading stage will submit the 3 NODE_LOCAL tasks as soon as resources were offered, then wait for spark.locality.wait.node, which was set to 30 minutes, the 2 RACK_LOCAL tasks will wait 30 minutes even though resources are available.

Fixing this is quite tricky – do we need to track the locality level individually for each task?

Attachments

Issue Links

Is contained by

SPARK-4939 Python updateStateByKey example hang in local mode

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Kay Ousterhout

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 13/Nov/14 19:14

Updated:: 17/May/20 17:47

Resolved:: 08/Feb/15 02:18