Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Right now we run the entire set of tests for every change. This means the tests take a long time. Our pull request builder checks out the merge branch from git, so we could do a diff and figure out what source files were changed, and run a more isolated set of tests. We should just run tests in a way that reflects the inter-dependencies of the project. E.g:
- If Spark core is modified, we should run all tests
- If just SQL is modified, we should run only the SQL tests
- If just Streaming is modified, we should run only the streaming tests
- If just Pyspark is modified, we only run the PySpark tests.
And so on. I think this would reduce the RTT of the tests a lot and it should be pretty easy to accomplish with some scripting foo.
Attachments
Issue Links
- is duplicated by
-
SPARK-3534 Avoid running MLlib and Streaming tests when testing SQL PRs
- Resolved
- links to