Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8987

Increase test coverage of DAGScheduler

    Details

    • Type: Umbrella
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.0.0
    • Fix Version/s: None
    • Component/s: Scheduler, Tests
    • Labels:
      None

      Description

      DAGScheduler is one of the most monstrous piece of code in Spark. Every time someone changes something there something like the following happens:

      (1) Someone pings a committer
      (2) The committer pings a scheduler maintainer
      (3) Scheduler maintainer correctly points out bugs in the patch
      (4) Author of patch fixes bug but introduces more bugs
      (5) Repeat steps 3 - 4 N times
      (6) Other committers / contributors jump in and start debating
      (7) The patch goes stale for months

      All of this happens because no one, including the committers, has high confidence that a particular change doesn't break some corner case in the scheduler. I believe one of the main issues is the lack of sufficient test coverage, which is not a luxury but a necessity for logic as complex as the DAGScheduler.

      As of the writing of this JIRA, DAGScheduler has ~1500 lines, while the DAGSchedulerSuite only has ~900 lines. I would argue that the suite line count should actually be many multiples of that of the original code.

      If you wish to work on this, let me know and I will assign it to you. Anyone is welcome.

        Activity

        Hide
        qqsun8819 OuyangJin added a comment -

        I'd like to work on this

        Show
        qqsun8819 OuyangJin added a comment - I'd like to work on this

          People

          • Assignee:
            Unassigned
            Reporter:
            andrewor14 Andrew Or
          • Votes:
            2 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:

              Development