Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-14671

Test IgniteClusterSnapshotCheckTest#testClusterSnapshotCheckOtherCluster is flaky

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • 2.10
    • 2.11
    • None

    Description

      To reproduce failure, run it several times, for example, set up IDE to run test with 'repeat until failure' option. Then you will get an assertion error:

      java.lang.AssertionError: Number of jobs must be equal to the cluster size (except local node): [a2844419-3081-432a-b611-c4f891900005]
      
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.assertTrue(Assert.java:41)
      	at org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
      	at org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)
      

      With applied patch [1] exception would be as follows:

      java.lang.AssertionError: Number of jobs must be equal to the cluster size (except local node): [e7346d3b-b257-466c-95c2-0a85a7600005], count: 1
      
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.assertTrue(Assert.java:41)
      	at org.apache.ignite.testframework.junits.JUnitAssertAware.assertTrue(JUnitAssertAware.java:29)
      	at org.apache.ignite.internal.processors.cache.persistence.snapshot.IgniteClusterSnapshotCheckTest.testClusterSnapshotCheckOtherCluster(IgniteClusterSnapshotCheckTest.java:316)
      

      It seems to be a concurrent update problem of thread unsafe HasSet (see [2, 3]):

      Unsafe HashSet
      Set<UUID> assigns = new HashSet<>();
      
      Concurrent update
      grid(i).context().io().addMessageListener(GridTopic.TOPIC_JOB, new GridMessageListener() {
          @Override public void onMessage(UUID nodeId, Object msg, byte plc) {
              if (msg instanceof GridJobExecuteRequest) {
                  GridJobExecuteRequest msg0 = (GridJobExecuteRequest)msg;
      
                  if (msg0.getTaskName().contains(SnapshotPartitionsVerifyTask.class.getName()))
                      assigns.add(locNodeId);
              }
          }
      });
      

      With concurrent Set implementation problem is not reproducing (see patch [4]):

      Set<UUID> assigns = Collections.newSetFromMap(new ConcurrentHashMap<>());
      
      1. testClusterSnapshotCheckOtherCluster_printCount.patch
      2. IgniteClusterSnapshotCheckTest.java#L287
      3. IgniteClusterSnapshotCheckTest.java#L300
      4. testClusterSnapshotCheckOtherCluster_fix.patch

      Attachments

        Issue Links

          Activity

            People

              shishkovilja Ilya Shishkov
              shishkovilja Ilya Shishkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m