Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
-
None
Description
The use case is:
-Grid with partitioned cache with 2 backups (or replicated)
-Node-1 is killed in the middle of checkpoint and started again.
-Node-1 detects unfinished checkpoint and tries to recover it.
-At this point Node-2 is killed while node-1 recovering is in progress.
-Node-1 fails with AssertionError.
PFA logs, parsed WAL, reproducer.
Can be reproduced with IgnitePdsContinuousRestartTest with minor changes,
we have to have 2 nodes flapping and kill nodes ungracefully.