Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30602 SPIP: Support push-based shuffle to improve shuffle efficiency
  3. SPARK-32921

Extend MapOutputTracker to support tracking and serving the metadata about each merged shuffle partitions for a given shuffle in push-based shuffle scenario

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersStop watchingWatchersConvert to IssueLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.0
    • Fix Version/s: 3.2.0
    • Component/s: Shuffle, Spark Core
    • Labels:
      None

      Description

      Similar to MapStatus, which tracks the metadata about each map task's shuffle output, we also need to track the metadata about each merged shuffle partition with push-based shuffle. We currently term this as MergeStatus.

      Since MergeStatus tracks metadata from the perspective of reducer tasks, it's not efficient to break up the metadata tracked in a MergeStatus and spread it across multiple MapStatus.

        Attachments

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

            • Assignee:
              vsowrirajan Venkata krishnan Sowrirajan Assign to me
              Reporter:
              mshen Min Shen

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment