Details
-
Bug
-
Status: Open
-
Not a Priority
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The test org.apache.flink.runtime.checkpoint.CheckpointCoordinatorTest.testTriggerAndDeclineCheckpointComplex is flaky and has the following failure:
Failures:
[ERROR] Failures:
[ERROR] CheckpointCoordinatorTest.testTriggerAndDeclineCheckpointComplex:1054 expected:<2> but was:<1>
I used the tool NonDex to find this flaky test.
Command: mvn -pl flink-runtime edu.illinois:nondex-maven-plugun:1.1.2:nondex -Dtest=org.apache.flink.runtime.checkpoint.CheckpointCoordinatorTest#testTriggerAndDeclineCheckpointComplex
I analyzed the assertion failure and found that checkpoint1Id and checkpoint2Id are getting assigned by iterating over a HashMap.
As we know, iterator() returns elements in a random order (JavaDoc) and this might cause test failures for some orders.
Therefore, to remove this non-determinism, we would change HashMap to LinkedHashMap.
On further analysis, it was found that the Map is getting initialized on line 1894 of org.apache.flink.runtime.checkpoint.CheckpointCoordinator class.
After changing from HashMap to LinkedHashMap, the above test is passing without any non-determinism.
Attachments
Issue Links
- links to