Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-17845 Make Ignite Consistent Again
  3. IGNITE-17793

Historical rebalance must use HWM instead of LWM to seek the proper checkpoint to avoid the data loss

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.15
    • None
    • Fixed potential data loss on historical rebalance
    • Docs Required, Release Notes Required

    Description

      Currently, historical rebalance at CheckpointHistory#searchEarliestWalPointer seeks for the newest checkpoint with counter less that lowest entry has to be rebalanced.

      Unfortunately, we may have more that one checkpoint with the same counter and it's impossible to use the newest one as a rebalance start point.

      For example, we have partition with LWM=100, some gaps and HWM=200.
      Checkpoint will have the counter == 100.
      Then we may close some gaps, exluding 101 (to keep LWM == 100).
      And again, checkpoint will have counter == 100.
      Newest checkpoint (marked with counter 100) will not cointain all committed entries with counter > 100.
      Then lets close the rest of the gaps to make historical rebalance possible.
      And after the rebalance finish, we'll see a warning "Some partition entries were missed during historical rebalance" and inconsistent cluster state.

      See reproducer at HistoricalRebalanceCheckpointTest.java

      Possible solution is to use HWM instead of LWM during the search.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            vladsz83 Vladimir Steshin Assign to me
            av Anton Vinogradov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment