Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
If the slave dies after checkpointing a queued task but before it was launched on an executor, the slave doesn't have enough information to relaunch it (because we only checkpoint Task instead of TaskInfo).
When the executor re-registers it should simply remove these tasks from its map.
Alternatively, slave could checkpoint TaskInfo instead of Task. We don't do this because TaskInfo.data could be potentially huge.