[IGNITE-16685] Expiration fails on topology version when cluster gets re-activated. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.13
Component/s: None
Labels:
- ise.lts

Release Note:
Fixed expiration failure on cluster re-activation
Ignite Flags:

Release Notes Required

Description

Expiration can fail on initialized topology version.

java.lang.AssertionError: Invalid topology version [topVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], group=TEST_CACHE]
	at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.readyTopologyVersion(GridDhtPartitionTopologyImpl.java:315)
	at org.apache.ignite.internal.processors.cache.GridCacheAdapter.nextVersion(GridCacheAdapter.java:4208)
	at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:3147)
	at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:3066)
	at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:1262)
	at org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:246)
	at org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.lambda$body$0(GridCacheSharedTtlCleanupManager.java:193)
	at java.util.concurrent.ConcurrentHashMap.computeIfPresent(ConcurrentHashMap.java:1769)
	at org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:192)
	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
	at java.lang.Thread.run(Thread.java:750)

Cause:

Cache starts and launches expiration routines.
Expiration timeouts begin to exaust.
Cluster gets deactivated.
Cluster is deactivated longer than the expiration timouts.
Cluster gets activated, caches starts, expiration routines start anew.
The expiration sees some records is expired and attempts to remove them.
During the cache start, topology version isn't confirmed yet (by FullMessage etc.)
Expiration fails

Solutions:

Skip partition if topology version isn't initialized additionally to partition state checking (state==OWNING).
Start expiration later, on confirmed topology.

Attachments

Issue Links

links to

GitHub Pull Request 9887

GitHub Pull Request #9888

Activity

People

Assignee:: Vladimir Steshin

Reporter:: Vladimir Steshin

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 14/Mar/22 07:57

Updated:: 13/May/22 09:21

Resolved:: 25/Mar/22 09:35