Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2926

Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle

    Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1.0
    • Fix Version/s: None
    • Component/s: Shuffle
    • Labels:
      None

      Description

      Currently Spark has already integrated sort-based shuffle write, which greatly improve the IO performance and reduce the memory consumption when reducer number is very large. But for the reducer side, it still adopts the implementation of hash-based shuffle reader, which neglects the ordering attributes of map output data in some situations.

      Here we propose a MR style sort-merge like shuffle reader for sort-based shuffle to better improve the performance of sort-based shuffle.

      Working in progress code and performance test report will be posted later when some unit test bugs are fixed.

      Any comments would be greatly appreciated.
      Thanks a lot.

        Attachments

        1. SortBasedShuffleRead.pdf
          56 kB
          Saisai Shao
        2. SortBasedShuffleReader on Spark 2.x.pdf
          3.52 MB
          Li Yuanjian
        3. Spark Shuffle Test Report.pdf
          340 kB
          Saisai Shao
        4. Spark Shuffle Test Report(contd).pdf
          326 kB
          Saisai Shao
        5. Spark Shuffle Test Report on Spark2.x.pdf
          186 kB
          Li Yuanjian

          Issue Links

            Activity

              People

              • Assignee:
                jerryshao Saisai Shao
                Reporter:
                jerryshao Saisai Shao
              • Votes:
                5 Vote for this issue
                Watchers:
                67 Start watching this issue

                Dates

                • Created:
                  Updated: