Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8987

Increase test coverage of DAGScheduler


    • Type: Umbrella
    • Status: Resolved
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 1.0.0
    • Fix Version/s: None
    • Component/s: Scheduler, Tests
    • Labels:


      DAGScheduler is one of the most monstrous piece of code in Spark. Every time someone changes something there something like the following happens:

      (1) Someone pings a committer
      (2) The committer pings a scheduler maintainer
      (3) Scheduler maintainer correctly points out bugs in the patch
      (4) Author of patch fixes bug but introduces more bugs
      (5) Repeat steps 3 - 4 N times
      (6) Other committers / contributors jump in and start debating
      (7) The patch goes stale for months

      All of this happens because no one, including the committers, has high confidence that a particular change doesn't break some corner case in the scheduler. I believe one of the main issues is the lack of sufficient test coverage, which is not a luxury but a necessity for logic as complex as the DAGScheduler.

      As of the writing of this JIRA, DAGScheduler has ~1500 lines, while the DAGSchedulerSuite only has ~900 lines. I would argue that the suite line count should actually be many multiples of that of the original code.

      If you wish to work on this, let me know and I will assign it to you. Anyone is welcome.




            • Assignee:
              andrewor14 Andrew Or
            • Votes:
              2 Vote for this issue
              8 Start watching this issue


              • Created: