Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-6491

Race in TopologyValidator.validate() and EVT_NODE_LEFT listener calls (split-brain activator)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1
    • 2.3
    • cache, general
    • None
    • Important

    Description

      The following wrong cache validate/put sequence may occur

      On node left GridDhtPartitionsExchangeFuture will be generated by the disco-event-worker thread.

      Then the exchange-worker thread does

      Split-brain detected [cacheName=test40, activatorTopVer=0, cacheTopVer=14]
      	at org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:1141)
      	at org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator.validate(IgniteTopologyValidatorGridSplitCacheTest.java:307)
      	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCacheGroup(GridDhtTopologyFutureAdapter.java:64)
      	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:1456)
      	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:115)
      	at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:450)
      	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:668)
      	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2278)
      

      The result of validation is stored in grpValidRes with value of false.

      After some delay the disco-event-worker thread will do

      java.lang.Exception: Node is segment activator [cacheName=test40, activatorTopVer=14]
      	at org.apache.ignite.internal.util.IgniteUtils.dumpStack(IgniteUtils.java:1141)
      	at org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator$2.apply(IgniteTopologyValidatorGridSplitCacheTest.java:360)
      	at org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest$SplitAwareTopologyValidator$2.apply(IgniteTopologyValidatorGridSplitCacheTest.java:349)
      	at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager$UserListenerWrapper.onEvent(GridEventStorageManager.java:1463)
      	at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:859)
      	at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:844)
      	at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record0(GridEventStorageManager.java:341)
      	at org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record(GridEventStorageManager.java:307)
      	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.recordEvent(GridDiscoveryManager.java:2478)
      	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body0(GridDiscoveryManager.java:2684)
      	at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2507)
      

      After this invocation the result of SplitAwareTopologyValidator.validate should be changed to true, but it was already invoked and the result has been cached in grpValidRes with the value of false.

      So any successive calls to cache.put causes to fail

      Test failed.
      java.lang.RuntimeException: tryPut() failed [gridName=cache.IgniteTopologyValidatorGridSplitCacheTest0]
      	at org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest.tryPut(IgniteTopologyValidatorGridSplitCacheTest.java:262)
      	at org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest.testTopologyValidator(IgniteTopologyValidatorGridSplitCacheTest.java:182)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at junit.framework.TestCase.runTest(TestCase.java:176)
      	at org.apache.ignite.testframework.junits.GridAbstractTest.runTestInternal(GridAbstractTest.java:2000)
      	at org.apache.ignite.testframework.junits.GridAbstractTest.access$000(GridAbstractTest.java:132)
      	at org.apache.ignite.testframework.junits.GridAbstractTest$5.run(GridAbstractTest.java:1915)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: javax.cache.CacheException: class org.apache.ignite.IgniteCheckedException: Failed to perform cache operation (cache topology is not valid): test40
      	at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1327)
      	at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.cacheException(IgniteCacheProxyImpl.java:1672)
      	at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1032)
      	at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:872)
      	at org.apache.ignite.internal.processors.cache.IgniteTopologyValidatorGridSplitCacheTest.tryPut(IgniteTopologyValidatorGridSplitCacheTest.java:252)
      	... 10 more
      Caused by: class org.apache.ignite.IgniteCheckedException: Failed to perform cache operation (cache topology is not valid): test40
      	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:112)
      	at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:415)
      	at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:248)
      	at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1170)
      	at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.put0(GridDhtAtomicCache.java:659)
      	at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2334)
      	at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2311)
      	at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1029)
      	... 12 more
      

      The updated test IgniteTopologyValidatorGridSplitCacheTest fails frequently on my laptop with 8 nodes and 100 caches.

      Attachments

        Activity

          People

            ein Alexandr Kuramshin
            ein Alexandr Kuramshin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: