Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
The scenario follows:
- InstallSnapshot request is sent, its processing starts hanging forever (it will be cancelled on step 3)
- After a timeout, second InstallSnapshot request is sent with same index+term as the first had; in JRaft, it causes a special handling (previous request processing is NOT cancelled)
- After a timeout, third InstallSnapshot request is sent with DIFFERENT index, so it cancels the first snapshot processing effectively unblocking the first thread
In the original JRaft implementation, after being unblocked, the first thread fails to clean up, so subsequent retries will always see a phantom of an unfinished snapshot, so the snapshotting process will be jammed. Also, node stop might stuck because one 'download' task will remain unfinished forever.
Attachments
Issue Links
- blocks
-
IGNITE-18079 Integrate RAFT streaming snapshots
- Resolved
- is duplicated by
-
IGNITE-18428 After a RAFT snapshot install timed out, subsequent installs consistently failed
- Resolved
- links to