[CELEBORN-1498] Decide whether to reuse the shuffle id based on the appShuffle's numAvailableOutputs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Labels:
- pull-request-available

Description

Currently, celeborn uses appShuffle's deterministic level to decide whether to reuse shuffle id. However, as in the example of ~~CELEBORN-1496~~, for a non-INDETERMINATE barrier stage, it requires celeborn to create a new shuffle id during recomputation. Currently, celeborn reuses the shuffle id.

The key issue is that the deterministic level is not the sole criterion for reusing the shuffleId. In Spark, the DAGScheduler determines whether to reuse task output based on multiple factors, including the deterministic level. If it determines that the outputs cannot be reused, it proactively cleans up the map output. Therefore, celeborn should decide whether to reuse the shuffle id based on whether the map output is empty, effectively delegating the decision of reusing previous attempt results to the DAGScheduler.

In addition, getOutputDeterministicLevel is a protected method of the RDD, which introduces the risk of it being dynamically changed during execution. Currently, celeborn only checks this once when registering the shuffle, which may pose a risk. However, this possibility is very low, and I have not encountered a situation where getOutputDeterministicLevel changes during execution.

Attachments

Issue Links

links to

GitHub Pull Request #2611

Activity

People

Assignee:: Unassigned

Reporter:: jiang13021

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 10/Jul/24 06:42

Updated:: 20/Aug/24 05:37

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 50m