Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
In spark, there is a stage re-computation, the DAGScheduler explicitly calling unregisterAllMapAndMergeOutput and then resubmitting the stage. At this point, Celeborn receives two map task results with only different stage attempt IDs.
For example: "stage 0 stageAttempt 0 map 10 taskAttempt 0" and "stage 0 stageAttempt 1 map 10 taskAttempt 0"
In Celeborn, map results are only distinguished by shuffleId, mapId, and taskAttemptId, hence making it unable to differentiate cases where only the stageAttemptId differs.