Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33235 Push-based Shuffle Improvement Tasks
  3. SPARK-37023

Avoid fetching merge status when shuffleMergeEnabled is false for a shuffleDependency during retry

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete CommentsDelete
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.2.1, 3.3.0
    • Shuffle
    • None

    Description

      The assertion below inĀ MapOutoutputTracker.getMapSizesByExecutorId is not guaranteed

      assert(mapSizesByExecutorId.enableBatchFetch == true)

      The reason is during some stage retry cases, the shuffleDependency.shuffleMergeEnabled is set to false, but there will be mergeStatus since the Driver has collected the merged status for its shuffle dependency. If this is the case, the current implementation would set the enableBatchFetch to false, since there are mergeStatus.

      Details can be found here:

      https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/MapOutputTracker.scala#L1492

      We should improve the implementation here.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            minyang Minchu Yang Assign to me
            zhouyejoe Ye Zhou
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment