Description
Motivation
After https://issues.apache.org/jira/browse/IGNITE-18088 test from ItIgniteNodeRestartTest started to fail. Need to investigate the reason of fails
Caused by: java.util.concurrent.CancellationException at java.base/java.util.concurrent.CompletableFuture.cancel(CompletableFuture.java:2396) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:491) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:473) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.refreshAndGetLeaderWithTerm(RaftGroupServiceImpl.java:232) at org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.ensureReplicaIsPrimary(PartitionReplicaListener.java:1880) at org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.invoke(PartitionReplicaListener.java:275) at org.apache.ignite.internal.replicator.Replica.processRequest(Replica.java:61) at org.apache.ignite.internal.replicator.ReplicaManager.lambda$new$5(ReplicaManager.java:162) at org.apache.ignite.network.DefaultMessagingService.sendToSelf(DefaultMessagingService.java:279) at org.apache.ignite.network.DefaultMessagingService.invoke0(DefaultMessagingService.java:226) at org.apache.ignite.network.DefaultMessagingService.invoke(DefaultMessagingService.java:154) at org.apache.ignite.internal.replicator.ReplicaService.sendToReplica(ReplicaService.java:91) at org.apache.ignite.internal.replicator.ReplicaService.invoke(ReplicaService.java:178) at org.apache.ignite.internal.tx.impl.TxManagerImpl.cleanup(TxManagerImpl.java:150) at org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$processTxFinishAction$42(PartitionReplicaListener.java:984) at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$sendWithRetry$38(RaftGroupServiceImpl.java:526) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) at java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
The root cause is that distribution zones data nodes watch listener in TableManager invokes updatePendingAssignmentsKeys with the same data nodes as in table configuration assignments so RaftGroupServiceImpl stops and new the same RaftGroupServiceImpl starts. It cancels invokes to old RaftGroupServiceImpl so we have CancellationException at sendWithRetry.
Implementation Notes
It is not need to change the assignments if the calculated assignments equals to the assignments in the table configuration.
Attachments
Issue Links
- causes
-
IGNITE-18847 Tests of ItIgniteNodeRestartTest hang
- Resolved
- relates to
-
IGNITE-18714 Not thread safe usage of InternalTableImpl#partitionMap
- Resolved
- links to