Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
The legacy replication manager internally keeps a list of all pending replications and deletes. Each time a container is checked, it check this list and removes any replications that have been completed or expired. Then it gets the list of remaining pending operations to help decide if container is healthy or not.
Rather than the ReplicationManager removing the completed and expired replications, we could have a standalone PendingContainerOps monitor, that works as follows:
1. Replication Manager adds pending replications and deletes to it.
2. Replication Manager queries it for anything pending for the current container and gets a list of PendingActions back.
3. The PendingReplicationMonitor has its own internal thread that checks for expired replications and removes them.
4. Completed replications and deletes are removed in ComtainerManagerImpl, which has add and removeContainer triggered via the container reports (ICR and FCR) from the datanodes as they are replicated.
This way, the ReplicationManager does not need to worry about expiring replications or removing completed entries. We also get the ability to have a more up-to-date view of the system, as the ICR / FCRs will keep the pending table up-to-date in real time, rather than having to wait for the container to be re-check inside replication manager.
We can have a fairly simple "ContainerReplicaPendingOps" class that is basically standalone and inject it into ReplicationManager and ContainerManagerImpl. This would allow for removing some complexity from RM and let the expiry and completion be tested in an isolated way.
Attachments
Issue Links
- is related to
-
HDDS-6771 EC: ReplicationManager - make ContainerReplicaPendingOps into a SCM service
- Resolved
- links to