Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1529

Support DFS based shuffle in addition to Netty shuffle

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • Spark Core
    • None

    Description

      In some environments, like with MapR, local volumes are accessed through the Hadoop filesystem interface. Shuffle is implemented by writing intermediate data to local disk and serving it to remote node using Netty as a transport mechanism. We want to provide an HDFS based shuffle such that data can be written to HDFS (instead of local disk) and served using HDFS API on the remote nodes. This could involve exposing a file system abstraction to Spark shuffle and have 2 modes of running it. In default mode, it will write to local disk and in the DFS mode, it will write to HDFS.

      Attachments

        1. Spark Shuffle using HDFS.pdf
          87 kB
          Kannan Rajah

        Issue Links

          Activity

            People

              rkannan82 Kannan Rajah
              pwendell Patrick Wendell
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: