Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-18495

Fix RAFT snapshot installation hang due to response swap on retry

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-beta2
    • None

    Description

      The scenario follows:

      1. InstallSnapshot request is sent, its processing starts hanging forever (it will be cancelled on step 3)
      2. After a timeout, second InstallSnapshot request is sent with same index+term as the first had; in JRaft, it causes a special handling (previous request processing is NOT cancelled)
      3. After a timeout, third InstallSnapshot request is sent with DIFFERENT index, so it cancels the first snapshot processing effectively unblocking the first thread

      In the original JRaft implementation, after being unblocked, the first thread fails to clean up, so subsequent retries will always see a phantom of an unfinished snapshot, so the snapshotting process will be jammed. Also, node stop might stuck because one 'download' task will remain unfinished forever.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rpuch Roman Puchkovskiy
            rpuch Roman Puchkovskiy
            Mirza Aliev Mirza Aliev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 20m
              20m

              Slack

                Issue deployment