Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-12081

Page replacement can reload invalid page during checkpoint

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.7.6
    • Component/s: None
    • Labels:
    • Release Note:
      Fixed an issue that could cause data corruption during checkpointing
    • Ignite Flags:
      Docs Required, Release Notes Required

      Description

      There is a race between writeCheckpointPages and page replacement process:

      • Checkpointer thread begins a checkpoint
      • Checkpointer thread calls getPageForCheckpoint(), which will copy page content and clear dirty flag
      • Page replacement tries to find a page for replacement and chooses this page, the page is thrown away
      • Before the page is written back to the store, the page is acquired again.

      As a result, an older copy of the page is brought back to memory, which causes all kinds of corruption exceptions and assertions.

      The attached unit test demonstrates the issue. It is likely that all baselines are affected starting from 2.4

      As a part of this ticket, we must add more unit-tests for checkpointing protocol invariants we rely on.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                DmitriyGovorukhin Dmitriy Govorukhin
                Reporter:
                DmitriyGovorukhin Dmitriy Govorukhin
                Reviewer:
                Dmitry Pavlov
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h