[FLINK-19520] Add reliable test randomization for checkpointing - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.12.0
Fix Version/s: 1.13.0
Component/s: Runtime / Configuration
Labels:
- pull-request-available

Description

With the larger refactoring of checkpoint alignment and the additional of more unaligned checkpoint settings, it becomes increasingly important to provide a large test coverage.

Unfortunately, adding sufficient test cases in a test matrix appears to be unrealistic: many of the encountered issues were subtle, sometimes caused by race conditions or unusual test configurations and often only visible in e2e tests.

Hence, we like to rely on all existing Flink tests to provide a sufficient coverage for checkpointing. However, as more and more options in unaligned checkpoint are going to be implemented in this and the upcoming release, running all Flink tests - especially e2e - in a test matrix is prohibitively expensive, even for nightly builds.

Thus, we want to introduce test randomization for all tests that do not use a specific checkpointing mode. In a similar way, we switched from aligned checkpoints by default in tests to unaligned checkpoint during the last release cycle.

To not burden the developers of other components too much, we set the following requirements:

Randomization should be seeded in a way that both builds on Azure pipelines and local builds will result in the same settings to ease debugging and ensure reproducibility.
Randomized options should be shown in the test log.
Execution order of test cases will not influence the randomization.
Randomization is hidden, no change on any test is needed.
Randomization only happens during local/remote test execution. User deployments are not affected.
Test developers are able to avoid randomization by explicitly providing checkpoint configs.