[RATIS-2148] Snapshot transfer may cause followers to trigger reloadStateMachine incorrectly - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.0
Fix Version/s: 3.1.1
Component/s: snapshot
Labels:
None

Description

Due to the fact that grpc streaming snapshot sending sends all requests at once, error handling is performed after all are sent, and the last snapshot request is used as a completion flag, which may lead to the successful receipt of the last request, but the previous request has failed. The sender handles the failure event during the retransmission of the snapshot. The receiver triggers state.reloadStateMachine because it successfully receives the last request, but due to incomplete snapshot reception

An md5 mismatch exception occurred before the last SnapshotRequest was received

The last snapshot request arrived, then successfully received, and then updated the index.

However, the snapshot reception is incomplete and triggers the reloadStateMachine.

I suggest using a flag to identify whether the entire snapshot request is abnormal.
If an exception occurs, the subsequent content of the request will not be processed.
Or the sender will wait for the receiver's reply. If there is a release error, resend it.

Finally, the current error retry level is the entire snapshot directory rather than a single chunk, which will cause a large number of snapshot files to be sent repeatedly, which can be optimized later

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image-2024-09-03-14-24-25-652.png
03/Sep/24 06:24
34 kB
yuuka
image-2024-09-03-14-25-22-174.png
03/Sep/24 06:25
40 kB
yuuka
image-2024-09-03-14-27-39-406.png
03/Sep/24 06:27
40 kB
yuuka
image-2024-09-03-14-28-31-529.png
03/Sep/24 06:28
34 kB
yuuka
image-2024-09-03-14-30-02-751.png
03/Sep/24 06:30
114 kB
yuuka
image-2024-09-03-14-33-40-760.png
03/Sep/24 06:33
285 kB
yuuka
image-2024-09-03-14-33-49-573.png
03/Sep/24 06:33
285 kB
yuuka

Issue Links

links to

GitHub Pull Request #1145

Activity

People

Assignee:: yuuka

Reporter:: yuuka

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 03/Sep/24 06:41

Updated:: 19/Sep/24 16:26

Resolved:: 07/Sep/24 02:30

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

2h 10m