Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
One confusing corner case of the DAGScheduler is when there is a shared shuffle dependency, a job might "skip" the stage associated with that shuffle dependency, since its already been created as part of a different stage. This means if there is a fetch failure, the retry will technically happen as part of a different Stage instance.
This already works, but is lacking tests, so I just plan on adding a simple test case.