Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-18737

Tests from ItIgniteNodeRestartTest became flaky

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0
    • None

    Description

      Motivation

      After https://issues.apache.org/jira/browse/IGNITE-18088 test from ItIgniteNodeRestartTest started to fail. Need to investigate the reason of fails

      Caused by: java.util.concurrent.CancellationException
      	at java.base/java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2396)
      	at org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:491)
      	at org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:473)
      	at org.apache.ignite.internal.raft.RaftGroupServiceImpl.refreshAndGetLeaderWithTerm(RaftGroupServiceImpl.java:232)
      	at org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.ensureReplicaIsPrimary(PartitionReplicaListener.java:1880)
      	at org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.invoke(PartitionReplicaListener.java:275)
      	at org.apache.ignite.internal.replicator.Replica.processRequest(Replica.java:61)
      	at org.apache.ignite.internal.replicator.ReplicaManager.lambda$new$5(ReplicaManager.java:162)
      	at org.apache.ignite.network.DefaultMessagingService.sendToSelf(DefaultMessagingService.java:279)
      	at org.apache.ignite.network.DefaultMessagingService.invoke0(DefaultMessagingService.java:226)
      	at org.apache.ignite.network.DefaultMessagingService.invoke(DefaultMessagingService.java:154)
      	at org.apache.ignite.internal.replicator.ReplicaService.sendToReplica(ReplicaService.java:91)
      	at org.apache.ignite.internal.replicator.ReplicaService.invoke(ReplicaService.java:178)
      	at org.apache.ignite.internal.tx.impl.TxManagerImpl.cleanup(TxManagerImpl.java:150)
      	at org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$processTxFinishAction$42(PartitionReplicaListener.java:984)
      	at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
      	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
      	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
      	at org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$sendWithRetry$38(RaftGroupServiceImpl.java:526)
      	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
      	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
      	at java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479)
      	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
      	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
      	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
      	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
      	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)

      The root cause is that distribution zones data nodes watch listener in TableManager invokes updatePendingAssignmentsKeys with the same data nodes as in table configuration assignments so RaftGroupServiceImpl stops and new the same RaftGroupServiceImpl starts. It cancels invokes to old RaftGroupServiceImpl so we have CancellationException at sendWithRetry.

      Implementation Notes

      It is not need to change the assignments if the calculated assignments equals to the assignments in the table configuration.

      Attachments

        Issue Links

          Activity

            People

              Sergey Uttsel Sergey Uttsel
              maliev Mirza Aliev
              Mirza Aliev Mirza Aliev
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m