Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4653 DAGScheduler refactoring and cleanup
  3. SPARK-4654

Clean up DAGScheduler's getMissingParentStages() and stageDependsOn() methods

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • Scheduler, Spark Core

    Description

      DAGScheduler has getMissingParentStages() and stageDependsOn() methods, which are suspiciously similar to getParentStages(). All of these methods perform traversal of the RDD / Stage graph to inspect parent stages. We can remove both of these methods, though: the set of parent stages is known when a Stage instance is constructed and is already stored in Stage.parents, so we can just check for missing stages by looking for unavailable stages in Stage.parents. Similarly, we can determine whether one stage depends on another by searching Stage.parents rather than performing the entire graph traversal from scratch.

      Attachments

        Issue Links

          Activity

            People

              joshrosen Josh Rosen
              joshrosen Josh Rosen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: