Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-29789

Fix flaky tests in CheckpointCoordinatorTest

    XMLWordPrintableJSON

Details

    Description

      The test org.apache.flink.runtime.checkpoint.CheckpointCoordinatorTest.testTriggerAndDeclineCheckpointComplex is flaky and has the following failure:

      Failures:
      [ERROR] Failures:
      [ERROR]   CheckpointCoordinatorTest.testTriggerAndDeclineCheckpointComplex:1054 expected:<2> but was:<1>

      I used the tool NonDex to find this flaky test.
      Command: mvn -pl flink-runtime edu.illinois:nondex-maven-plugun:1.1.2:nondex -Dtest=org.apache.flink.runtime.checkpoint.CheckpointCoordinatorTest#testTriggerAndDeclineCheckpointComplex

      I analyzed the assertion failure and found that checkpoint1Id and checkpoint2Id are getting assigned by iterating over a HashMap.
      As we know, iterator() returns elements in a random order (JavaDoc) and this might cause test failures for some orders.

      Therefore, to remove this non-determinism, we would change HashMap to LinkedHashMap.
      On further analysis, it was found that the Map is getting initialized on line 1894 of org.apache.flink.runtime.checkpoint.CheckpointCoordinator class.

      After changing from HashMap to LinkedHashMap, the above test is passing without any non-determinism.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sopan98 Sopan Phaltankar
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: