[IGNITE-8122] Partition state restored from WAL may be lost if no checkpoints are done - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4
Fix Version/s: 2.5
Component/s: cache
Labels:
None

Description

Problem:
1) Start several nodes with enabled persistence.
2) Make sure that all partitions for 'ignite-sys-cache' have status OWN on all nodes and appropriate PartitionMetaStateRecord record is logged to WAL
3) Stop all nodes and start again, activate cluster. Checkpoint for 'ignite-sys-cache' is empty, because there were no data in cache.
4) State for all partitions will be restored to OWN (GridCacheDatabaseSharedManager#restoreState) from WAL, but not recorded to page memory, because there were no checkpoints and data in cache. Store manager doesn't have any allocated pages (including meta) for such partitions.
5) On exchange done we're trying to restore states of partitions (initPartitionsWhenAffinityReady) on all nodes. Because page memory is empty, states of all partitions will be restored to MOVING by default.
6) All nodes start to rebalance partitions from each other and this process become unpredictable because we're trying to rebalance from MOVING partitions.

Attachments

Issue Links

links to

GitHub Pull Request #3745

GitHub Pull Request #4080

GitHub Pull Request #4081

Activity

People

Assignee:: Alexey Goncharuk

Reporter:: Pavel Kovalenko

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 03/Apr/18 09:28

Updated:: 11/Dec/18 10:03

Resolved:: 18/Apr/18 13:08