Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1455

Determine which test suites to run based on code changes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.2.0
    • Project Infra
    • None

    Description

      Right now we run the entire set of tests for every change. This means the tests take a long time. Our pull request builder checks out the merge branch from git, so we could do a diff and figure out what source files were changed, and run a more isolated set of tests. We should just run tests in a way that reflects the inter-dependencies of the project. E.g:

      • If Spark core is modified, we should run all tests
      • If just SQL is modified, we should run only the SQL tests
      • If just Streaming is modified, we should run only the streaming tests
      • If just Pyspark is modified, we only run the PySpark tests.

      And so on. I think this would reduce the RTT of the tests a lot and it should be pretty easy to accomplish with some scripting foo.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pwendell Patrick Wendell
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: