Description
For now, if there is no snapshot existed, the returned snapshotIndex is 0.
It is not correctly, since the raft log starts from index 0 (BTW, in raft paper, the first valid log index is 1, but in ratis, it is 0), a snapshot taken at snapshotIndex 0 should contain the log entry 0. The snapshotIndex indicating an empty/fake snapshot should be -1.
The fix is as follows:
diff --git a/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java b/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java index 728f7e9c..6307ca79 100644 --- a/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java +++ b/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java @@ -431,7 +431,7 @@ class ServerState implements Closeable { long getLatestInstalledSnapshotIndex() { final TermIndex ti = latestInstalledSnapshot.get(); - return ti != null? ti.getIndex(): 0L; + return ti != null? ti.getIndex(): -1L; } /** @@ -440,7 +440,7 @@ class ServerState implements Closeable { */ long getSnapshotIndex() { final SnapshotInfo s = getLatestSnapshot(); - final long latestSnapshotIndex = s != null ? s.getIndex() : 0; + final long latestSnapshotIndex = s != null ? s.getIndex() : -1; return Math.max(latestSnapshotIndex, getLatestInstalledSnapshotIndex()); }
This issue is found in the test of bootstrap SCM in SCM HA.
TL; DR
The issue can be reproduced by follow steps:
1, use follow configuration for SCM HA
<!-- scm snapshot -->
<property>
<name>ozone.scm.ha.ratis.snapshot.threshold</name>
<value>1</value>
</property>
<property>
<name>ozone.scm.ha.ratis.log.purge.gap</name>
<value>5</value>
</property>
<property>
<name>ozone.scm.ha.ratis.log.purge.enabled</name>
<value>true</value>
</property>
2, int and start SCM1
3, start DN1, DN2, DN3
4, wait for a while, until there is dozens of raft log entries.
5, bootstrap SCM2
6, start SCM2, SCM1 will be stuck in appendEntries Timeout.
The RCA is:
after init SCM1, the raft log is
[root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# ls /tmp/metadata/scm-ha/b7a11f50-85b8-4b93-8a6b-c00705b4e3d4/current/ log_inprogress_0 raft-meta raft-meta.conf [root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]#
after SCM1 is started, and there is dozens of log entries generated, the raft log is
[root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# ls /tmp/metadata/scm-ha/b7a11f50-85b8-4b93-8a6b-c00705b4e3d4/current/ log_inprogress_1 raft-meta raft-meta.conf [root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]#
Since purge is enable and purge gap is 5, the file log_0_0 is removed. The leader does not have entry 0 any more.
Then SCM2 is bootstrapped and started, we expect SCM2 to download a snapshot from leader, since leader miss log entry 0, and SCM2 is empty.
The follow SCM do receive an installSnapshot request,
021-04-25 19:25:55,933 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: receive installSnapshot: 630b3fbd-861f-4676-8e16-8ae548ed6658->55560a65-e7f7-46f0-b463-f511187fd358#0-t2,notify:(t:2, i:1)
But due to the default snapshot index is 0,
if (inProgressInstallSnapshotRequest.compareAndSet(null, firstAvailableLogTermIndex)) { // Check if snapshot index is already at par or ahead of the first // available log index of the Leader. long snapshotIndex = state.getSnapshotIndex(); if (snapshotIndex + 1 >= firstAvailableLogIndex) { // State Machine has already installed the snapshot. Return the // latest snapshot index to the Leader. inProgressInstallSnapshotRequest.compareAndSet(firstAvailableLogTermIndex, null); final InstallSnapshotReplyProto reply = ServerProtoUtils.toInstallSnapshotReplyProto( leaderId, getMemberId(), currentTerm, InstallSnapshotResult.ALREADY_INSTALLED, snapshotIndex); LOG.info("{}: StateMachine snapshotIndex is {}", getMemberId(), snapshotIndex); return reply; }
0(snapshotIndex) + 1 == 1(firstAvailableLogIndex), thus follower SCM ignore the downloading.
2021-04-25 19:25:55,933 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: receive installSnapshot: 630b3fbd-861f-4676-8e16-8ae548ed6658->55560a65-e7f7-46f0-b463-f511187fd358#0-t2,notify:(t:2, i:1) 2021-04-25 19:25:55,940 [grpc-default-executor-0] INFO org.apache.hadoop.hdds.scm.ha.SCMStateMachine: leader changed, yet current SCM is still follower. 2021-04-25 19:25:55,940 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: change Leader from null to 630b3fbd-861f-4676-8e16-8ae548ed6658 at term 2 for installSnapshot, leader elected after 1416ms 2021-04-25 19:25:55,954 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: StateMachine snapshotIndex is 0 2021-04-25 19:25:56,019 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: set new configuration index: 1 configurationEntry { peers { id: "630b3fbd-861f-4676-8e16-8ae548ed6658" address: "9.134.50.210:9865" } } from snapshot 2021-04-25 19:25:56,022 [grpc-default-executor-0] DEBUG org.apache.ratis.util.LifeCycle: 630b3fbd-861f-4676-8e16-8ae548ed6658|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0: NEW 2021-04-25 19:25:56,022 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: set configuration 1: [630b3fbd-861f-4676-8e16-8ae548ed6658|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0], old=null 2021-04-25 19:25:56,026 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: reply installSnapshot: 630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED 2021-04-25 19:25:56,026 [grpc-default-executor-0] DEBUG org.apache.ratis.grpc.server.GrpcServerProtocolService: 55560a65-e7f7-46f0-b463-f511187fd358: reply 630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED
In the leader side, the matchIndex is updated to 0, and the nextIndex is updated to 1.
@Override public void setSnapshotIndex(long newSnapshotIndex) { snapshotIndex.setUnconditionally(newSnapshotIndex, infoIndexChange); matchIndex.setUnconditionally(newSnapshotIndex, infoIndexChange); nextIndex.setUnconditionally(newSnapshotIndex + 1, infoIndexChange); }
2021-04-25 19:23:58,738 [grpc-default-executor-0] INFO org.apache.ratis.grpc.server.GrpcLogAppender: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358-InstallSnapshotResponseHandler: received the first reply 630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED 2021-04-25 19:23:58,740 [grpc-default-executor-0] INFO org.apache.ratis.grpc.server.GrpcLogAppender: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358-InstallSnapshotResponseHandler: Already Installed Snapshot Index 0. 2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO org.apache.ratis.server.leader.FollowerInfo: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358: snapshotIndex: setUnconditionally 0 -> 0 2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO org.apache.ratis.server.leader.FollowerInfo: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358: matchIndex: setUnconditionally 0 -> 0 2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO org.apache.ratis.server.leader.FollowerInfo: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358: nextIndex: setUnconditionally 0 -> 1 2021-04-25 19:23:58,741 [grpc-default-executor-0] DEBUG org.apache.ratis.server.leader.FollowerInfo: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358: commitIndex: updateToMax old=-1, new=0, updated? true
Since matchIndex is monotonically increased, and the nextIndex will at least be matchIndex + 1,
final long nextIndex = 1 + Optional.ofNullable(request) .map(AppendEntriesRequest::getPreviousLog) .map(TermIndex::getIndex) .orElseGet(getFollower()::getMatchIndex);
Leader keep sending entries from index 1, but follower is empty, missing entry 0, the cluster stuck in inconsistency append entries.
With the fix, the download of checkpoint happened, and the system works well.
2021-04-26 11:23:10,761 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: receive installSnapshot: 6e7bb411-69c5-40ea-8238-bc65cc4e9b55->a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0-t2,notify:(t:2, i:1) 2021-04-26 11:23:10,768 [grpc-default-executor-0] INFO org.apache.hadoop.hdds.scm.ha.SCMStateMachine: leader changed, yet current SCM is still follower. 2021-04-26 11:23:10,768 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: change Leader from null to 6e7bb411-69c5-40ea-8238-bc65cc4e9b55 at term 2 for installSnapshot, leader elected after 1391ms 2021-04-26 11:23:10,787 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: notifyInstallSnapshot: nextIndex is 0 but the leader's first available index is 1. 2021-04-26 11:23:11,090 [grpc-default-executor-0] INFO org.apache.hadoop.hdds.scm.ha.SCMStateMachine: Received install snapshot notification from SCM leader: 9.134.50.210:9865 with term index: (t:2, i:1) 2021-04-26 11:23:11,091 [pool-13-thread-1] INFO org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl: Downloading checkpoint from leader SCM scm1 and reloading state from the checkpoint. 2021-04-26 11:23:11,092 [grpc-default-executor-0] DEBUG org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: Snapshot Installation Request received and is in progress 2021-04-26 11:23:11,164 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: set new configuration index: 1 configurationEntry { peers { id: "6e7bb411-69c5-40ea-8238-bc65cc4e9b55" address: "9.134.50.210:9865" } } from snapshot 2021-04-26 11:23:11,166 [grpc-default-executor-0] DEBUG org.apache.ratis.util.LifeCycle: 6e7bb411-69c5-40ea-8238-bc65cc4e9b55|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0: NEW 2021-04-26 11:23:11,167 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: set configuration 1: [6e7bb411-69c5-40ea-8238-bc65cc4e9b55|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0], old=null 2021-04-26 11:23:11,171 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: reply installSnapshot: 6e7bb411-69c5-40ea-8238-bc65cc4e9b55<-a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0:FAIL-t0,IN_PROGRESS 2021-04-26 11:23:11,171 [grpc-default-executor-0] DEBUG org.apache.ratis.grpc.server.GrpcServerProtocolService: a0ba8aa4-5cff-4452-9adf-638135a9ebaf: reply 6e7bb411-69c5-40ea-8238-bc65cc4e9b55<-a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0:FAIL-t0,IN_PROGRESS 2021-04-26 11:23:11,175 [grpc-default-executor-0] INFO org.apache.ratis.grpc.server.GrpcServerProtocolService: a0ba8aa4-5cff-4452-9adf-638135a9ebaf: Completed INSTALL_SNAPSHOT, lastRequest: 6e7bb411-69c5-40ea-8238-bc65cc4e9b55->a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0-t2,notify:(t:2, i:1)
Attachments
Attachments
Issue Links
- links to