Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8987

Increase test coverage of DAGScheduler

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.0.0
    • None
    • Scheduler, Spark Core, Tests

    Description

      DAGScheduler is one of the most monstrous piece of code in Spark. Every time someone changes something there something like the following happens:

      (1) Someone pings a committer
      (2) The committer pings a scheduler maintainer
      (3) Scheduler maintainer correctly points out bugs in the patch
      (4) Author of patch fixes bug but introduces more bugs
      (5) Repeat steps 3 - 4 N times
      (6) Other committers / contributors jump in and start debating
      (7) The patch goes stale for months

      All of this happens because no one, including the committers, has high confidence that a particular change doesn't break some corner case in the scheduler. I believe one of the main issues is the lack of sufficient test coverage, which is not a luxury but a necessity for logic as complex as the DAGScheduler.

      As of the writing of this JIRA, DAGScheduler has ~1500 lines, while the DAGSchedulerSuite only has ~900 lines. I would argue that the suite line count should actually be many multiples of that of the original code.

      If you wish to work on this, let me know and I will assign it to you. Anyone is welcome.

      Attachments

        There are no Sub-Tasks for this issue.

        Activity

          People

            Unassigned Unassigned
            andrewor14 Andrew Or
            Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: