Description
Simultaneous creating and removing a cache with the same name may lead to the following NullPointerException in the disco-notifier thread and this is the reason for triggering FailureHandler.
[2022-04-04 14:22:41,571][ERROR][disco-notifier-worker-#36%cache.IgniteDynamicCacheStartFailTest0%][GridDiscoveryManager] Exception in discovery notifier worker thread. java.lang.AssertionError: Dynamic cache descriptor is missing [cacheName=TestDynamicCache] at org.apache.ignite.internal.processors.cache.ClusterCachesInfo.onCacheChangeRequested(ClusterCachesInfo.java:570) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.onCustomEvent(GridCacheProcessor.java:4307) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:680) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.access$7500(GridDiscoveryManager.java:559) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4$NotificationTask.run(GridDiscoveryManager.java:994) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2852) at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2890) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) at java.lang.Thread.run(Thread.java:748)
It looks like the issue is caused by the concurrent starting and stopping caches with the same names.
The following scenario results in the AssertionError (in the case when assertions are disabled it will lead to the mentioned NullPointerException):
- the user starts a new cache with the name "A"
- he DynamicCacheChangeRequest is sent over the cluster ring
- every node, that is received this message, updates its list of registered cache descriptors (see ClusterCachesInfo.onCacheChangeRequested(DynamicCacheChangeBatch, AffinityTopologyVersion))
- a node initiates a new partition map exchange
- user tries to stop cache with the same name "A"
- new DynamicCacheChangeRequest is sent and, therefore it will clean up the list of registered caches
- at this point, the previous exchange fails for some reason (PME that is related to cache start)
- the DynamicCacheChangeFailureMessage is sent over the ring and tries to find the required cache descriptor on every node which is already removed.
Attachments
Attachments
Issue Links
- links to