Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30602 SPIP: Support push-based shuffle to improve shuffle efficiency
  3. SPARK-32921

Extend MapOutputTracker to support tracking and serving the metadata about each merged shuffle partitions for a given shuffle in push-based shuffle scenario

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.0
    • 3.2.0
    • Shuffle, Spark Core
    • None

    Description

      Similar to MapStatus, which tracks the metadata about each map task's shuffle output, we also need to track the metadata about each merged shuffle partition with push-based shuffle. We currently term this as MergeStatus.

      Since MergeStatus tracks metadata from the perspective of reducer tasks, it's not efficient to break up the metadata tracked in a MergeStatus and spread it across multiple MapStatus.

      Attachments

        Activity

          People

            vsowrirajan Venkata krishnan Sowrirajan
            mshen Min Shen
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: