Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-6462 Phase II : Erasure Coding Offline Recovery & Read/Write Improvements
  3. HDDS-6744

EC: ReplicationManager - create ContainerReplicaPendingOps class and integrate with ContainerManager

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0
    • SCM

    Description

      The legacy replication manager internally keeps a list of all pending replications and deletes. Each time a container is checked, it check this list and removes any replications that have been completed or expired. Then it gets the list of remaining pending operations to help decide if container is healthy or not.

      Rather than the ReplicationManager removing the completed and expired replications, we could have a standalone PendingContainerOps monitor, that works as follows:

      1. Replication Manager adds pending replications and deletes to it.
      2. Replication Manager queries it for anything pending for the current container and gets a list of PendingActions back.
      3. The PendingReplicationMonitor has its own internal thread that checks for expired replications and removes them.
      4. Completed replications and deletes are removed in ComtainerManagerImpl, which has add and removeContainer triggered via the container reports (ICR and FCR) from the datanodes as they are replicated.

      This way, the ReplicationManager does not need to worry about expiring replications or removing completed entries. We also get the ability to have a more up-to-date view of the system, as the ICR / FCRs will keep the pending table up-to-date in real time, rather than having to wait for the container to be re-check inside replication manager.

      We can have a fairly simple "ContainerReplicaPendingOps" class that is basically standalone and inject it into ReplicationManager and ContainerManagerImpl. This would allow for removing some complexity from RM and let the expiry and completion be tested in an isolated way.

      Attachments

        Issue Links

          Activity

            People

              sodonnell Stephen O'Donnell
              sodonnell Stephen O'Donnell
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: