[SPARK-25299] Use remote storage for persisting shuffle data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.1.0
Fix Version/s: None
Component/s: Shuffle, Spark Core
Labels:
- SPIP

Description

In Spark, the shuffle primitive requires Spark executors to persist data to the local disk of the worker nodes. If executors crash, the external shuffle service can continue to serve the shuffle data that was written beyond the lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the external shuffle service is deployed on every worker node. The shuffle service shares local disk with the executors that run on its node.

There are some shortcomings with the way shuffle is fundamentally implemented right now. Particularly:

If any external shuffle service process or node becomes unavailable, all applications that had an executor that ran on that node must recompute the shuffle blocks that were lost.
Similarly to the above, the external shuffle service must be kept running at all times, which may waste resources when no applications are using that shuffle service node.
Mounting local storage can prevent users from taking advantage of desirable isolation benefits from using containerized environments, like Kubernetes. We had an external shuffle service implementation in an early prototype of the Kubernetes backend, but it was rejected due to its strict requirement to be able to mount hostPath volumes or other persistent volume setups.

In the following architecture discussion document (note: not an SPIP), we brainstorm various high level architectures for improving the external shuffle service in a way that addresses the above problems. The purpose of this umbrella JIRA is to promote additional discussion on how we can approach these problems, both at the architecture level and the implementation level. We anticipate filing sub-issues that break down the tasks that must be completed to achieve this goal.

Edit June 28 2019: Our SPIP is here: https://docs.google.com/document/d/1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n_0iMSWwhCQ/edit

Attachments

Issue Links

blocks

SPARK-24432 Add support for dynamic resource allocation

Open

is blocked by

SPARK-26268 Decouple shuffle data from Spark deployment

In Progress

relates to

SPARK-27941 Serverless Spark in the Cloud

Open

SPARK-31924 Create remote shuffle service reference implementation

Open

SPARK-1529 Support DFS based shuffle in addition to Netty shuffle

Resolved

links to

[Github] Pull Request #22777 (ifilonenko)

Shuffle Metadata Tracking Discussion

(2 links to)

Sub-Tasks

1.	Shuffle Storage API: Writer API and usage in BypassMergeSortShuffleWriter	Resolved	Matt Cheah
2.	Shuffle Storage API: Reads	Open	Unassigned
3.	Shuffle Storage API: Driver Lifecycle	Resolved	Yifei Huang
4.	Shuffle Storage API: Shuffle Cleanup	Open	Unassigned
5.	Make Javadoc in org.apache.spark.shuffle.api visible	Resolved	Hyukjin Kwon
6.	Shuffle Storage API: Use writer API in UnsafeShuffleWriter	Resolved	Matt Cheah
7.	Don't hold a reference to two partitionLengths arrays	Resolved	Matt Cheah
8.	Shuffle Writer API: Indeterminate shuffle support in ShuffleMapOutputWriter	Resolved	Unassigned
9.	Shuffle storage API: Use API in SortShuffleWriter	Resolved	Matt Cheah
10.	Mark new Shuffle apis as @Experimental (instead of @Private)	Open	Unassigned
11.	Register shuffle map output metadata with a shuffle output tracker	In Progress	Unassigned
12.	Use Spark plugin support to manage shuffle plugin lifecycle	In Progress	Unassigned
13.	Return map output metadata from shuffle writers	Resolved	Matt Cheah
14.	Shuffle Storage API: Dynamic updates of shuffle metadata	Open	Unassigned
15.	Abstract Location in MapStatus to enable support for custom storage	In Progress	Unassigned
16.	Define local/hostlocal/remote fetch for custom storage	Open	Unassigned
17.	Support json serde for the custom location	Open	Unassigned
18.	Allow ShuffleDriverComponent to declare if shuffle data is reliably stored	Resolved	Mridul Muralidharan

Activity

People

Assignee:: Unassigned

Reporter:: Matt Cheah

Votes:: 34 Vote for this issue

Watchers:: 173 Start watching this issue

Dates

Created:: 01/Sep/18 00:25

Updated:: 10/Dec/21 22:01