Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
1.5.0
-
None
Description
Follow up to SPARK-5259. During stage retry, its possible for a stage to "complete" by registering all its map output and starting the downstream stages, before the latest task set has completed. This will result in the earlier task set continuing to submit tasks, that are both unnecessary and increase the chance of hitting SPARK-8029.
Spark should mark all tasks sets for a stage as zombie as soon as its map output is registered. Note that this involves coordination between the various scheduler components (DAGScheduler and TaskSetManager at least) which isn't easily testable with the current setup.
To be clear, this is not just referring to canceling running tasks (which may be taken care of by SPARK-2666). This is to make sure that the taskset is marked as a zombie, to prevent submitting new tasks from this task set.
Attachments
Issue Links
- is blocked by
-
SPARK-10372 Add end-to-end tests for the scheduling code
- Resolved
-
SPARK-5259 Do not submit stage until its dependencies map outputs are registered
- Resolved
- links to