There is a race between writeCheckpointPages and page replacement process:
- Checkpointer thread begins a checkpoint
- Checkpointer thread calls getPageForCheckpoint(), which will copy page content and clear dirty flag
- Page replacement tries to find a page for replacement and chooses this page, the page is thrown away
- Before the page is written back to the store, the page is acquired again.
As a result, an older copy of the page is brought back to memory, which causes all kinds of corruption exceptions and assertions.
The attached unit test demonstrates the issue. It is likely that all baselines are affected starting from 2.4
As a part of this ticket, we must add more unit-tests for checkpointing protocol invariants we rely on.