Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
2.3.0
-
None
Description
Track tasks separately for each stage attempt (instead of tracking by stage), and do NOT reset the numRunningTasks to 0 on StageCompleted.
In the case of stage retry, the taskEnd event from the zombie stage sometimes makes the number of totalRunningTasks negative, which will causes the job to get stuck.
Similar problem also exists with stageIdToTaskIndices & stageIdToSpeculativeTaskIndices.
If it is a failed taskEnd event of the zombie stage, this will cause stageIdToTaskIndices or stageIdToSpeculativeTaskIndices to remove the task index of the active stage, and the number of totalPendingTasks will increase unexpectedly.
Attachments
Issue Links
- is related to
-
SPARK-11334 numRunningTasks can't be less than 0, or it will affect executor allocation
- Resolved
- links to