Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.2.0
-
None
Description
`TaskSchedulerImpl` handle task finished event at `handleSuccessfulTask` and `handleFailedTask` , but in some case the task was already finished state, which we should ignore task finished event.
Case describe:
when a executor finished a task of some stage, the driver will receive a StatusUpdate event to handle it. At the same time the driver found the executor heartbeat timed out, so the dirver also need handle ExecutorLost event simultaneously. There was a race condition issues here, which will make TaskSetManager.successful and TaskSetManager.tasksSuccessful wrong result. More detailed description and discussion can be viewed at https://issues.apache.org/jira/browse/SPARK-36575 and https://github.com/apache/spark/pull/33872