Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-16789

Failure to dynamically create a new cache can be a cause of NullPointerException/AssertionError in the discovery thread

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.14
    • None

    Description

      Simultaneous creating and removing a cache with the same name may lead to the following NullPointerException in the disco-notifier thread and this is the reason for triggering FailureHandler.

      [2022-04-04 14:22:41,571][ERROR][disco-notifier-worker-#36%cache.IgniteDynamicCacheStartFailTest0%][GridDiscoveryManager] Exception in discovery notifier worker thread.
      java.lang.AssertionError: Dynamic cache descriptor is missing [cacheName=TestDynamicCache]
      	at org.apache.ignite.internal.processors.cache.ClusterCachesInfo.onCacheChangeRequested(ClusterCachesInfo.java:570)
      	at org.apache.ignite.internal.processors.cache.GridCacheProcessor.onCustomEvent(GridCacheProcessor.java:4307)
      	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:680)
      	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.access$7500(GridDiscoveryManager.java:559)
      	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4$NotificationTask.run(GridDiscoveryManager.java:994)
      	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2852)
      	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2890)
      	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
      	at java.lang.Thread.run(Thread.java:748)
      

      It looks like the issue is caused by the concurrent starting and stopping caches with the same names.
      The following scenario results in the AssertionError (in the case when assertions are disabled it will lead to the mentioned NullPointerException):

      • the user starts a new cache with the name "A"
      • he DynamicCacheChangeRequest is sent over the cluster ring
      • every node, that is received this message, updates its list of registered cache descriptors (see ClusterCachesInfo.onCacheChangeRequested(DynamicCacheChangeBatch, AffinityTopologyVersion))
      • a node initiates a new partition map exchange
      • user tries to stop cache with the same name "A"
      • new DynamicCacheChangeRequest is sent and, therefore it will clean up the list of registered caches
      • at this point, the previous exchange fails for some reason (PME that is related to cache start)
      • the DynamicCacheChangeFailureMessage is sent over the ring and tries to find the required cache descriptor on every node which is already removed.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            slava.koptilin Vyacheslav Koptilin
            slava.koptilin Vyacheslav Koptilin
            Vladislav Pyatkov Vladislav Pyatkov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 20m
              20m

              Slack

                Issue deployment