[SPARK-34601] Do not delete shuffle file on executor lost event when using remote shuffle service - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: 3.2.0
Fix Version/s: None
Component/s: Shuffle
Labels:
- shuffle

Description

There are multiple work going on with disaggregated/remote shuffle service (e.g. LinkedIn shuffle, Facebook shuffle service, Uber shuffle service). Such remote shuffle service is not Spark External Shuffle Service. It could be third party shuffle solution and user uses it by setting spark.shuffle.manager. In those systems, shuffle data will be stored on different server other than executor. Spark should not mark shuffle data lost when the executor is lost. We could add a Spark configuration to control this behavior. By default, Spark still mark shuffle file lost. For disaggregated/remote shuffle service, people could set the configure to not mark shuffle file lost.

Attachments

Issue Links

links to

[Github] Pull Request #31715 (hiboyang)

Activity

People

Assignee:: Unassigned

Reporter:: BoYang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 03/Mar/21 00:04

Updated:: 05/Mar/21 23:32

Resolved:: 05/Mar/21 23:32