[SPARK-1529] Support DFS based shuffle in addition to Netty shuffle - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: Spark Core
Labels:
None

Description

In some environments, like with MapR, local volumes are accessed through the Hadoop filesystem interface. Shuffle is implemented by writing intermediate data to local disk and serving it to remote node using Netty as a transport mechanism. We want to provide an HDFS based shuffle such that data can be written to HDFS (instead of local disk) and served using HDFS API on the remote nodes. This could involve exposing a file system abstraction to Spark shuffle and have 2 modes of running it. In default mode, it will write to local disk and in the DFS mode, it will write to HDFS.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Spark Shuffle using HDFS.pdf
19/Mar/15 23:51
87 kB
Kannan Rajah

Issue Links

is related to

SPARK-3685 Spark's local dir should accept only local paths

Resolved

SPARK-25299 Use remote storage for persisting shuffle data

Open

Activity

People

Assignee:: Kannan Rajah

Reporter:: Patrick Wendell

Votes:: 0 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 17/Apr/14 20:39

Updated:: 02/Dec/19 14:42

Resolved:: 18/Jan/16 13:53