Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-16685

Expiration fails on topology version when cluster gets re-activated.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.13
    • None
    • Fixed expiration failure on cluster re-activation
    • Release Notes Required

    Description

      Expiration can fail on initialized topology version.

      java.lang.AssertionError: Invalid topology version [topVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], group=TEST_CACHE]
      	at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.readyTopologyVersion(GridDhtPartitionTopologyImpl.java:315)
      	at org.apache.ignite.internal.processors.cache.GridCacheAdapter.nextVersion(GridCacheAdapter.java:4208)
      	at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:3147)
      	at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:3066)
      	at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:1262)
      	at org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:246)
      	at org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.lambda$body$0(GridCacheSharedTtlCleanupManager.java:193)
      	at java.util.concurrent.ConcurrentHashMap.computeIfPresent(ConcurrentHashMap.java:1769)
      	at org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:192)
      	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
      	at java.lang.Thread.run(Thread.java:750)
      

      Cause:

      • Cache starts and launches expiration routines.
      • Expiration timeouts begin to exaust.
      • Cluster gets deactivated.
      • Cluster is deactivated longer than the expiration timouts.
      • Cluster gets activated, caches starts, expiration routines start anew.
      • The expiration sees some records is expired and attempts to remove them.
      • During the cache start, topology version isn't confirmed yet (by FullMessage etc.)
      • Expiration fails

      Solutions:

      1. Skip partition if topology version isn't initialized additionally to partition state checking (state==OWNING).
      2. Start expiration later, on confirmed topology.

      Attachments

        Activity

          People

            vladsz83 Vladimir Steshin
            vladsz83 Vladimir Steshin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: