Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7829

SortShuffleWriter writes inconsistent data & index files on stage retry

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersStop watchingWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.1
    • Fix Version/s: 1.5.3, 1.6.0
    • Component/s: Shuffle, Spark Core
    • Labels:
      None

      Description

      When a stage is retried, even if a shuffle map task was successful, it may get retried in any case. If it happens to get scheduled on the same executor, the old data file is appended, while the index file still assumes the data starts in position 0. This leads to an apparently corrupt shuffle map output, since when the data file is read, the index file points to the wrong location.

        Attachments

        Issue Links

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment