Description
Original summary: java.lang.IllegalStateException thrown when nextIndex in follower is inconsistent with leader's snapshot
1. Error Stack
The reason: The leader sets follower's nextIndex to 0 in https://github.com/apache/ratis/blob/60587b63a4401cc6160907d33fb5cd89dbbdc724/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java#L134 and then sends a snapshot to the follower.
The SnapshotInstallationHandler in follower asserts it's a stale snapshot and throws a unrecoverable IllegalStatementException.
ISSUE 1
The assertions should compare snapshot last included with commit index rather that next index. https://github.com/apache/ratis/blob/60587b63a4401cc6160907d33fb5cd89dbbdc724/ratis-server/src/main/java/org/apache/ratis/server/impl/SnapshotInstallationHandler.java#L177-L179
ISSUE 2
When a follower receives a stale snapshot from leader, rather than throwing a IlleaglStatementException in assert, can we just simply reply ALREADY_INSTALLED?
ISSUE 3
When appendEntriesHandler onError is called, shall we remain the nextIndex unchanged? https://github.com/apache/ratis/blob/60587b63a4401cc6160907d33fb5cd89dbbdc724/ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java#L420
onError only happens when grpc connections has something wrong (channel disconnected, timeout, etc) and the leader receives no reply. I think we can keep nextIndex unchanged and retry. When the communications are restored, let onNext decide what is the correct nextIndex (may experience INCONSISTENCY).
Attachments
Attachments
Issue Links
- links to