Uploaded image for project: 'Ratis'
  1. Ratis
  2. RATIS-1841

Fixed bug where cluster restart failed to transfer snapshot

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.1, 2.5.0
    • 2.5.2
    • gRPC
    • None

    Description

      Hi, We have discovered that the problem we reported earlier is still recurring. The problem is that a multi-replica cluster may fail to restart after transfering a outdated snapshot

      Upon further investigation, we found that the issue originates from the resetClient function. Specifically, there is a flaw in its logic that causes it to incorrectly set the nextIndex of followers to 0, which leads to the error message shown in the attached screenshot.

      Upon reviewing the code, we determined that the issue arose only after merging the PR. Surprisingly, the code was correct prior to merging.

      After investigating further, we determined that the solution was to remove the index judgment, as the conditions onError and request == null were sufficient to encompass the required test conditions.

      PTAL~szetszwo

      Attachments

        1. screenshot-1.png
          777 kB
          Xinyu Tan
        2. image-2023-05-13-10-50-14-537.png
          449 kB
          Xinyu Tan

        Activity

          People

            tanxinyu Xinyu Tan
            tanxinyu Xinyu Tan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: