Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-16789

Failure to dynamically create a new cache can be a cause of NullPointerException/AssertionError in the discovery thread

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.14
    • None

    Description

      Simultaneous creating and removing a cache with the same name may lead to the following NullPointerException in the disco-notifier thread and this is the reason for triggering FailureHandler.

      [2022-04-04 14:22:41,571][ERROR][disco-notifier-worker-#36%cache.IgniteDynamicCacheStartFailTest0%][GridDiscoveryManager] Exception in discovery notifier worker thread.
      java.lang.AssertionError: Dynamic cache descriptor is missing [cacheName=TestDynamicCache]
      	at org.apache.ignite.internal.processors.cache.ClusterCachesInfo.onCacheChangeRequested(ClusterCachesInfo.java:570)
      	at org.apache.ignite.internal.processors.cache.GridCacheProcessor.onCustomEvent(GridCacheProcessor.java:4307)
      	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:680)
      	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.access$7500(GridDiscoveryManager.java:559)
      	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4$NotificationTask.run(GridDiscoveryManager.java:994)
      	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2852)
      	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2890)
      	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
      	at java.lang.Thread.run(Thread.java:748)
      

      It looks like the issue is caused by the concurrent starting and stopping caches with the same names.
      The following scenario results in the AssertionError (in the case when assertions are disabled it will lead to the mentioned NullPointerException):

      • the user starts a new cache with the name "A"
      • he DynamicCacheChangeRequest is sent over the cluster ring
      • every node, that is received this message, updates its list of registered cache descriptors (see ClusterCachesInfo.onCacheChangeRequested(DynamicCacheChangeBatch, AffinityTopologyVersion))
      • a node initiates a new partition map exchange
      • user tries to stop cache with the same name "A"
      • new DynamicCacheChangeRequest is sent and, therefore it will clean up the list of registered caches
      • at this point, the previous exchange fails for some reason (PME that is related to cache start)
      • the DynamicCacheChangeFailureMessage is sent over the ring and tries to find the required cache descriptor on every node which is already removed.

      Attachments

        Issue Links

          Activity

            People

              slava.koptilin Vyacheslav Koptilin
              slava.koptilin Vyacheslav Koptilin
              Vladislav Pyatkov Vladislav Pyatkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m