[RATIS-1841] Fixed bug where cluster restart failed to transfer snapshot - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.1, 2.5.0
Fix Version/s: 2.5.2
Component/s: gRPC
Labels:
None

Description

Hi, We have discovered that the problem we reported earlier is still recurring. The problem is that a multi-replica cluster may fail to restart after transfering a outdated snapshot

Upon further investigation, we found that the issue originates from the resetClient function. Specifically, there is a flaw in its logic that causes it to incorrectly set the nextIndex of followers to 0, which leads to the error message shown in the attached screenshot.

Upon reviewing the code, we determined that the issue arose only after merging the PR. Surprisingly, the code was correct prior to merging.

After investigating further, we determined that the solution was to remove the index judgment, as the conditions onError and request == null were sufficient to encompass the required test conditions.

PTAL~szetszwo

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2023-05-13-10-50-14-537.png
13/May/23 02:50
449 kB
Xinyu Tan
screenshot-1.png
13/May/23 02:59
777 kB
Xinyu Tan

Activity

People

Assignee:: Xinyu Tan

Reporter:: Xinyu Tan

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 13/May/23 02:57

Updated:: 13/May/23 08:40

Resolved:: 13/May/23 08:39