Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1529

Support DFS based shuffle in addition to Netty shuffle

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:
      None

      Description

      In some environments, like with MapR, local volumes are accessed through the Hadoop filesystem interface. Shuffle is implemented by writing intermediate data to local disk and serving it to remote node using Netty as a transport mechanism. We want to provide an HDFS based shuffle such that data can be written to HDFS (instead of local disk) and served using HDFS API on the remote nodes. This could involve exposing a file system abstraction to Spark shuffle and have 2 modes of running it. In default mode, it will write to local disk and in the DFS mode, it will write to HDFS.

        Attachments

        1. Spark Shuffle using HDFS.pdf
          87 kB
          Kannan Rajah

          Issue Links

            Activity

              People

              • Assignee:
                rkannan82 Kannan Rajah
                Reporter:
                pwendell Patrick Wendell
              • Votes:
                0 Vote for this issue
                Watchers:
                16 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: