Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-16621

AtomicSequence.incrementAndGet() fails intermittently.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.13
    • data structures
    • Fixed an issue with IgniteAtomicSequence that led to AssertionError.
    • Release Notes Required

    Description

      Using IgniteAtomicSequence can lead to the following AssertionError:

      java.lang.AssertionError: null
      at org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl$2.call(GridCacheAtomicSequenceImpl.java:307)
      at org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl$2.call(GridCacheAtomicSequenceImpl.java:298)
      at org.apache.ignite.internal.processors.cache.GridCacheUtils.retryTopologySafe(GridCacheUtils.java:1418)
      at org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl.internalUpdate(GridCacheAtomicSequenceImpl.java:230)
      at org.apache.ignite.internal.processors.datastructures.GridCacheAtomicSequenceImpl.incrementAndGet(GridCacheAtomicSequenceImpl.java:135)
      

      The following code produces the mentioned error:

      private Callable<Long> internalUpdate(final long l, final boolean updated) {
          return new Callable<Long>() {
              @Override public Long call() throws Exception {
                  assert distUpdateFreeTop.isHeldByCurrentThread() || distUpdateLockedTop.isHeldByCurrentThread();
      
                  try (GridNearTxLocal tx = CU.txStartInternal(ctx, cacheView, PESSIMISTIC, REPEATABLE_READ)) {
                      GridCacheAtomicSequenceValue seq = cacheView.get(key);
      
                      checkRemoved();
      
                      assert seq != null; <-- This assert can trigger the error in case the partition loss policy is IGNORE and the corresponding partition has been lost.
      

      The root cause of the issue is that for in-memory case partition loss policy is IGNORE. Therefore, the following read can return a null value without any exceptions and trigger the mentioned AssertionError.

      try (GridNearTxLocal tx = CU.txStartInternal(ctx, cacheView, PESSIMISTIC, REPEATABLE_READ)) {
          GridCacheAtomicSequenceValue seq = cacheView.get(key);
      

      The possible workaround is setting a reasonable number of backups in AtomicConfiguration. Monitoring of lost partitions would be nice as well.

      The proposed solution is quite obvious. Need to change the assert assert seq != null; to explicit check and throw a suitable exception if needed. This should allow the user to detect this and re-create the sequence, for example.

      Attachments

        Issue Links

          Activity

            People

              slava.koptilin Vyacheslav Koptilin
              slava.koptilin Vyacheslav Koptilin
              Vladislav Pyatkov Vladislav Pyatkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m