Description
Currently, the notify install snapshot would not inform when the whole progress is done
From the Ozone side, the statemachine's notifyInstallSnapshotFromLeader is a single request and process. It is fine before we find out that the installation of the snapshot could get stuck due to the whole RocksDB replacement each time (the leader could have purged the raft log during transferring the snapshot and thus triggers another snapshot installation when the previous install request is done). To solve this, we come up with the incremental snapshot idea, which could transfer the incremental part of RocksDB in the next install request, and needs to preserve the checkpoints. The incremental snapshot needs to compare the checkpoints and hence the checkpoints cannot be deleted after the first request to install a snapshot.
The cleanup time of these checkpoints is hard to determine. It is difficult for the follower to tell whether the latest installed snapshot is the last one and apply the logs immediately. The cleanup time depends on the leader's state, and only the leader knows if it is the time to notify the snapshot again or just send append entries. Only when the leader thinks that the follower has already caught up could trigger the cleanup( error case is not included here).
Thus, we shall have an event to help trigger the cleanup the checkpoints for the Ozone or generally inform the completeness of the install snapshot, which means no more install snapshot requests will be sent and the follower has caught up.
Attachments
Attachments
Issue Links
- links to