Description
In some scenarios where Ratis is restarted, we find that there is a certain probability of an error at startup.
For case 1
By looking through the code, I found here is a problem with the code
StateMachineUpdater will call this line when applying any member change log from previous term if the Leader exists, but the startupEntry for the current term may not have been initialized yet, so the assertion will throw an error.
We should only fire this assertion if the log matches the current term.
In addition, I found that the current implementation triggers notifyLeaderReady several times in the member change log of the current term, which is not consistent with the semantics of this interface, because the Leader is always in the ready state
For case 2
I noticed that StateMachineUpdater fetching leaderState and raftserver changetoFollower are asynchronous. As shown in the log, StateMachineUpdater gets leaderStateImpl with term 179 during execution and executes checkReady, during which time it receives log requests with larger term. Update term to 180, set leaderStateImpl to null, and then run getCurrentTerm on leaderStateImpl with term 179. In this case, we should get the latest term directly from server.getState().getCurrentTerm() so that we don't get this error
Attachments
Attachments
Issue Links
- causes
-
RATIS-2055 Changed logic of LeaderState.isReady impacts application
- Resolved
- is related to
-
HDDS-10690 SCMStateMachine Override LeaderEventApi.notifyLeaderReady
- Resolved
- links to