Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
It is desirable to export data out of the StateMachine itself to ensure that RAFT quorums remain well-performing. After new data stops flowing into the LogService, we can export the log out of the LogService and to a distributed filesystem that would have more ideal storage facilities. Moving these logs out of the LogService helps avoid local disk capacity issues at the Ratis level. Ideally, this would be another extension point for each storage systems which would provide common LogService API, easing adoption by downstream applications
- Work Breakdown
- Configure MetadataStateMachine with some “remote storage” location (e.g. hdfs://localhost:8020/ratis-logs). This is “cold” storage where we can place a log in after we disallow writes to it (when it moves to the state “CLOSED”)
- When the MetadataStateMachine moves a log to the CLOSED state, it must queue it to be uploaded to the remote storage location. This can be accompanied with some new state, e.g. “ARCHIVED”
- When clients try to read a log which is “ARCHIVED”, they must know to read from this location in remote storage, instead of from the LogStateMachine as before.
- Implement a LogStream implementation that gives the same API to clients to read from the LogStateMachine (as it does today) or from the remote storage location.
- Optional: rewrite the RAFT log into a more optimal form upon writing it to the remote storage location. E.g we don’t need to maintain Ratis-internal log messages.
Thanks elserj for the above write-up.