Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25299 Use remote storage for persisting shuffle data
  3. SPARK-28607

Don't hold a reference to two partitionLengths arrays

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: Shuffle
    • Labels:
      None

      Description

      SPARK-28209 introduced the new shuffle writer API and its usage in BypassMergeSortShuffleWriter. However, the design of the API forces the partition lengths to be tracked both in the implementation of the plugin and also by the higher-level writer. This leads to redundant memory usage. We should only track the lengths of the partitions in the implementation of the plugin and propagate this information back up to the writer as the return value of commitAllPartitions.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mcheah Matt Cheah
                Reporter:
                mcheah Matt Cheah
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: