Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Implemented
-
None
Description
See RATIS-2089 for the context.
In Ozone's XceiverClientRatis#watchForCommit, there are two watch commits request with different ReplicationLevel
- Watch for ALL_COMMITTED
- Watch for MAJORITY_COMMITTED (If the previous watch threw an exception)
Based on the second watch request, the client will remove some failed datanode UUID from the commitInfoMap.
The second watch might not be necessary since the entries in AbstractCommitWatcher.commitIndexMap implies that the PutBlock request has been committed to the majority of the servers. Therefore, another MAJORITY_COMMITTED watch might not be necessary. From my understanding, the second MAJORITY_COMMITTED only serves to gain information to remove entries from commitInfoMap.
If the first watch failed with NotReplicatedException, we might be able to remove the need to a second watch request. Since NotReplicatedException is a Raft server exception, we can include the CommitInfoProtos in the NotReplicatedException. The client can use this CommitInfoProtos to remove the entry from commitInfoMap without sending another WATCH request.
We can use CommitInfoProto in NotReplicatedException introduced in RATIS-2089 to remove the need for watch MAJORITY_COMMITTED calls if NotReplicatedException is thrown from the DN Ratis leader.
This also requires DN Ratis server watch timeout configuration change hdds.ratis.raft.server.watch.timeout to be lower than the client watch timeout hdds.ratis.raft.client.rpc.watch.request.timeout so that NotReplicatedException will be thrown instead of TimeoutException.
Attachments
Attachments
Issue Links
- is related to
-
HDDS-10972 Reduce the default watch timeout configuration in DatanodeRatisServerConfig
- Resolved
- relates to
-
HDDS-10108 [hsync] Adopt RATIS-1994 to reduce hsync latency
- Patch Available
- requires
-
HDDS-10910 Bump Ratis to 3.1.0
- Resolved
-
RATIS-2089 Add CommitInfoProto in NotReplicatedException
- Resolved
- links to