[HDDS-11897] Migrating Ozone Manager replication from post Ratis execution to Pre Ratis execution. - ASF JIRA

XML

Word

Printable

JSON

Epic Name:
Migrating Ozone Manager replication from post Ratis execution to Pre Ratis execution.

The following challenges and solutions are proposed as part of this epic.

The current implementation depends on consensus on the order of requests received rather than on consensus on the processing of the requests.
1. This can lead to subtle bugs due to discrepancies in the actual execution of requests on the leader vs the followers.
The double buffer implementation is currently meant to optimize the rate at which writes get flushed to RocksDB, but the effective batching achieved is 1.2 at best. It is also a source of continuous bugs and added complexity for new features.
1. The new implementation will not depend on the double buffer behavior.
The number of transactions that can be pushed through Ratis currently averages around 25k.
1. Requests will be batched before sending them to Ratis for consensus.
Readers and writers are not separated, and there is potential contention between readers and writers.
Although FSO and OBS bucket types can have finer-grained locking, coarse-grained locks are held at the Bucket level.
1. The new implementation will introduce locking at the start of the request processing to serialize requests that must be linearized against each other.

These changes and related changes together should result

Significant performance improvement in the rate of request processing (3x)
Better code quality and test coverage
Elimination of subtle bugs arising from the write-back cache design of double buffer writes post Ratis
Fine grained locking such that requests that can be processed in parallel are run without locking.
Separation of resources for readers and writers. This will also help process reads from followers using Ratis' capabilities for linearized reads from followers.

contains

HDDS-11415 Leader Execution at Leader