Details
Description
When a stage is retried, even if a shuffle map task was successful, it may get retried in any case. If it happens to get scheduled on the same executor, the old data file is appended, while the index file still assumes the data starts in position 0. This leads to an apparently corrupt shuffle map output, since when the data file is read, the index file points to the wrong location.
Attachments
Issue Links
- is related to
-
SPARK-7308 Should there be multiple concurrent attempts for one stage?
- Resolved
- relates to
-
SPARK-8029 ShuffleMapTasks must be robust to concurrent attempts on the same executor
- Resolved
- links to