Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-7278

Node failed to recover partition from WAL on unstable topology

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 2.4
    • persistence
    • None

    Description

      The use case is:
      -Grid with partitioned cache with 2 backups (or replicated)
      -Node-1 is killed in the middle of checkpoint and started again.
      -Node-1 detects unfinished checkpoint and tries to recover it.
      -At this point Node-2 is killed while node-1 recovering is in progress.
      -Node-1 fails with AssertionError.

      PFA logs, parsed WAL, reproducer.

      Can be reproduced with IgnitePdsContinuousRestartTest with minor changes,
      we have to have 2 nodes flapping and kill nodes ungracefully.

      Attachments

        1. page_corrupted2.tar.gz
          21.31 MB
          Andrey Mashenkov

        Activity

          People

            DmitriyGovorukhin Dmitriy Govorukhin
            amashenkov Andrey Mashenkov
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m