Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Regression test selection is a methodology for decreasing the number of tests that are run in regression test suites. The idea is to that for a given change, only run the tests that are relevant to the given change, rather than all the tests.
For example, right now Hive QA runs all the standalone-metastore tests for every patch. However, most of the time this isn't necessary. If a patch is only modifying files in ql or common there is no need to run standalone-metastore tests as there is no dependency from the standalone-metastore to any other Hive module (exception for storage-api).
RTS is commonly used for CI systems. Google has published some interesting info on how they do this
- http://google-engtools.blogspot.com/2011/06/testing-at-speed-and-scale-of-google.html
- https://drive.google.com/file/d/0Bx-FLr0Egz9zYXJfMEZ6NERTbkU/view
- Bazel seems to provide some functionality to do this: http://code.hootsuite.com/faster-automated-tests-bazel/
There are a few other open-source projects that offer different ways of doing this: Ekstazi
A short term solution would be to implement the following:
- Before each Hive QA, parse the Maven dependency graph
- Take the specified patch and check which Maven modules it modifies
- Runs tests contained inside the modified modules and their dependent modules