Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
In some environments, like with MapR, local volumes are accessed through the Hadoop filesystem interface. Shuffle is implemented by writing intermediate data to local disk and serving it to remote node using Netty as a transport mechanism. We want to provide an HDFS based shuffle such that data can be written to HDFS (instead of local disk) and served using HDFS API on the remote nodes. This could involve exposing a file system abstraction to Spark shuffle and have 2 modes of running it. In default mode, it will write to local disk and in the DFS mode, it will write to HDFS.
Attachments
Attachments
Issue Links
- is related to
-
SPARK-3685 Spark's local dir should accept only local paths
- Resolved
-
SPARK-25299 Use remote storage for persisting shuffle data
- Open