Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: Shuffle, Spark Core
    • Labels:
      None
    • Target Version/s:

      Description

      Building on the pluggability in SPARK-2044, a sort-based shuffle implementation that takes advantage of an Ordering for keys (or just sorts by hashcode for keys that don't have it) would likely improve performance and memory usage in very large shuffles. Our current hash-based shuffle needs an open file for each reduce task, which can fill up a lot of memory for compression buffers and cause inefficient IO. This would avoid both of those issues.

        Issue Links

          Activity

          Hide
          pwendell Patrick Wendell added a comment -

          Yes - that's correct.

          Show
          pwendell Patrick Wendell added a comment - Yes - that's correct.
          Hide
          nchammas Nicholas Chammas added a comment -

          The 1.1.0 release notes call out this change:

          "This “sort-based shuffle” will be become the default in the next release, and is now available to users. For jobs with large numbers of reducers, we recommend turning this on."

          Is turning this on just a matter of setting spark.shuffle.manager to SORT?

          Show
          nchammas Nicholas Chammas added a comment - The 1.1.0 release notes call out this change: "This “sort-based shuffle” will be become the default in the next release, and is now available to users. For jobs with large numbers of reducers, we recommend turning this on." Is turning this on just a matter of setting spark.shuffle.manager to SORT ?
          Hide
          pwendell Patrick Wendell added a comment -

          This was fixed by matei in this pull request:
          https://github.com/apache/spark/pull/1499

          Show
          pwendell Patrick Wendell added a comment - This was fixed by matei in this pull request: https://github.com/apache/spark/pull/1499
          Hide
          apachespark Apache Spark added a comment -

          User 'mateiz' has created a pull request for this issue:
          https://github.com/apache/spark/pull/1499

          Show
          apachespark Apache Spark added a comment - User 'mateiz' has created a pull request for this issue: https://github.com/apache/spark/pull/1499
          Hide
          matei Matei Zaharia added a comment -

          That's true, that could be a problem. I'll look into either modifying EAOM to support this or doing all the steps while sorting. One thing I wanted to do was to avoid calling the Partitioner multiple times on each key, since that can be expensive.

          Show
          matei Matei Zaharia added a comment - That's true, that could be a problem. I'll look into either modifying EAOM to support this or doing all the steps while sorting. One thing I wanted to do was to avoid calling the Partitioner multiple times on each key, since that can be expensive.
          Hide
          jerryshao Saisai Shao added a comment -

          IMHO I think in this situation we should manage two buffers: one is EAOM, and another is SortedWriter.

          I think if we use EAOM, we can set a custom comparator in which first compare partitions, and then compare keys or hashcode of key if necessary, so we can get the iterator of EAOM sorted by partition and key (if necessary), and can be written to single file without SortedFileWriter.

          Another way if we use SortedFileWriter, it would be better to aggregate at the merging steps without EAOM, though a little complex.

          Sorry for any misunderstanding.

          Show
          jerryshao Saisai Shao added a comment - IMHO I think in this situation we should manage two buffers: one is EAOM, and another is SortedWriter. I think if we use EAOM, we can set a custom comparator in which first compare partitions, and then compare keys or hashcode of key if necessary, so we can get the iterator of EAOM sorted by partition and key (if necessary), and can be written to single file without SortedFileWriter. Another way if we use SortedFileWriter, it would be better to aggregate at the merging steps without EAOM, though a little complex. Sorry for any misunderstanding.
          Hide
          matei Matei Zaharia added a comment -

          Right now I was thinking it would happen using an ExternalAppendOnlyMap before we feed values into the SortedFileWriter. This is not ideal though, and it would be good to put it in the SortedFileWriter as we merge files, which would require that to be aware of how to compare keys (or use hash codes if that fails, like the EAOM). But I'll probably start with this initially and then see how complex it is to do it the other way. I think this would still improve performance over the current setup.

          I'll try to post some code soon so that some of these improvements can be done in parallel.

          Show
          matei Matei Zaharia added a comment - Right now I was thinking it would happen using an ExternalAppendOnlyMap before we feed values into the SortedFileWriter. This is not ideal though, and it would be good to put it in the SortedFileWriter as we merge files, which would require that to be aware of how to compare keys (or use hash codes if that fails, like the EAOM). But I'll probably start with this initially and then see how complex it is to do it the other way. I think this would still improve performance over the current setup. I'll try to post some code soon so that some of these improvements can be done in parallel.
          Hide
          jerryshao Saisai Shao added a comment -

          Hi Matei, great to see your design doc, a simple question:

          When will we do map-side combine in sort-based shuffle? At the beginning of SortedFileWriter, or when merging the intermediate file like MR?

          Looking forward to your plan, and hopes I can share my effort to this .

          Show
          jerryshao Saisai Shao added a comment - Hi Matei, great to see your design doc, a simple question: When will we do map-side combine in sort-based shuffle? At the beginning of SortedFileWriter, or when merging the intermediate file like MR? Looking forward to your plan, and hopes I can share my effort to this .
          Hide
          matei Matei Zaharia added a comment -

          Oops, attached the wrong file before. Here's the right one.

          Show
          matei Matei Zaharia added a comment - Oops, attached the wrong file before. Here's the right one.
          Hide
          matei Matei Zaharia added a comment -

          I've posted a design doc for a simple version of this. This is based on input from Sandy Ryza as well, who had been looking at a prototype by Saisai Shao, so I know several people had been thinking about this. I'd like to get this simple version into 1.1, and then there's room for more optimizations on top (pointed out in the doc).

          Note that our shuffle is a bit different from MapReduce's for a few reasons, e.g. we don't require all objects to be Writable, so we can't make as many assumptions about the serializer, and many of our operators don't need sorted data. This design aims to deal with that while solving the main problem with hash-based shuffle (lots of open files and buffers). In the future we can add options for sorting the data as it goes along.

          Show
          matei Matei Zaharia added a comment - I've posted a design doc for a simple version of this. This is based on input from Sandy Ryza as well, who had been looking at a prototype by Saisai Shao, so I know several people had been thinking about this. I'd like to get this simple version into 1.1, and then there's room for more optimizations on top (pointed out in the doc). Note that our shuffle is a bit different from MapReduce's for a few reasons, e.g. we don't require all objects to be Writable, so we can't make as many assumptions about the serializer, and many of our operators don't need sorted data. This design aims to deal with that while solving the main problem with hash-based shuffle (lots of open files and buffers). In the future we can add options for sorting the data as it goes along.
          Hide
          mridulm80 Mridul Muralidharan added a comment -

          The plan Tom and I had was to see if we can modify and adopt hadoop's shuffle to provide this functionality : secondary sort was an interesting side effect we might get - but not primary goal.

          Also, we were planning to investigate whether we can use MR's approach to how shuffle is stored as well.
          This became lower priority as the 2G fix proceeded (since the impact of current design was alleviated indirectly by that), but would still be useful to investigate.

          Show
          mridulm80 Mridul Muralidharan added a comment - The plan Tom and I had was to see if we can modify and adopt hadoop's shuffle to provide this functionality : secondary sort was an interesting side effect we might get - but not primary goal. Also, we were planning to investigate whether we can use MR's approach to how shuffle is stored as well. This became lower priority as the 2G fix proceeded (since the impact of current design was alleviated indirectly by that), but would still be useful to investigate.

            People

            • Assignee:
              matei Matei Zaharia
              Reporter:
              matei Matei Zaharia
            • Votes:
              0 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development