Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-17845 Make Ignite Consistent Again
  3. IGNITE-17908

AssertionError LWM after reserved on data insertion after the cluster restart

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.15
    • None

    Description

      After the cluster restart you may see the following assertion:

      java.lang.AssertionError: LWM after reserved: lwm=2030, reserved=2010, cntr=Counter [lwm=2030, missed=[], hwm=2030, reserved=2011]
      
      	at org.apache.ignite.internal.processors.cache.PartitionUpdateCounterTrackingImpl.reserve(PartitionUpdateCounterTrackingImpl.java:270)
      	at org.apache.ignite.internal.processors.cache.PartitionUpdateCounterErrorWrapper.reserve(PartitionUpdateCounterErrorWrapper.java:58)
      	at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.getAndIncrementUpdateCounter(IgniteCacheOffheapManagerImpl.java:1594)
      	at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.getAndIncrementUpdateCounter(GridCacheOffheapManager.java:2483)
      	at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.getAndIncrementUpdateCounter(GridDhtLocalPartition.java:942)
      	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.calculatePartitionUpdateCounters(IgniteTxLocalAdapter.java:510)
      	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1356)
      	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:726)
      	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1132)
      	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.prepareAsyncLocal(GridNearTxLocal.java:4282)
      	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareColocatedTx(IgniteTxHandler.java:303)
      	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.proceedPrepare(GridNearOptimisticTxPrepareFuture.java:565)
      	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.prepareSingle(GridNearOptimisticTxPrepareFuture.java:392)
      	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.prepare0(GridNearOptimisticTxPrepareFuture.java:335)
      	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFutureAdapter.prepareOnTopology(GridNearOptimisticTxPrepareFutureAdapter.java:205)
      	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFutureAdapter.prepare(GridNearOptimisticTxPrepareFutureAdapter.java:129)
      	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.prepareNearTxLocal(GridNearTxLocal.java:3946)
      	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.commitNearTxLocalAsync(GridNearTxLocal.java:3994)
      	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.optimisticPutFuture(GridNearTxLocal.java:3051)
      	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.putAsync0(GridNearTxLocal.java:729)
      	at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.putAsync(GridNearTxLocal.java:484)
      	at org.apache.ignite.internal.processors.cache.GridCacheAdapter$20.op(GridCacheAdapter.java:2511)
      	at org.apache.ignite.internal.processors.cache.GridCacheAdapter$20.op(GridCacheAdapter.java:2509)
      	at org.apache.ignite.internal.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:4284)
      	at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put0(GridCacheAdapter.java:2509)
      	at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2487)
      	at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2466)
      	at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1332)
      	at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:867)
      	at org.apache.ignite.util.BrokenRebalanceTest.testCountersOnCrachRecovery(BrokenRebalanceTest.java:191)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2506)
      

      when clusted was under the load and has a counters gaps on restart.

      See LwmAfterReservedTest.java

      Possible solution is to fix PartitionUpdateCounterTrackingImpl#reservedCntr calculation
      from

      long max = Math.max(val, curLwm);
      

      to something like

      long max = Math.max(val, curLwm);
      max = Math.max(max, highestAppliedCounter());
      

      Another possible fix is to get rid or PartitionUpdateCounterTrackingImpl#reservedCntr and always use highestAppliedCounter().
      This may slowdown the system, but look like a correct fix. So, some optimisation may be required.

      Please make sure that counters are correct after the fix using idle_verify.

      Attachments

        1. LwmAfterReservedTest.java
          9 kB
          Anton Vinogradov

        Issue Links

          Activity

            People

              timonin.maksim Maksim Timonin
              av Anton Vinogradov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m