Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-8699 Further Replication Manager Improvements
  3. HDDS-8536

ReplicationManager: Unhealthy replicas could block Ratis containers being recovered

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0
    • SCM

    Description

      In a similar way to HDDS-8535, if the cluster is small, say 4 nodes and a Ratis container has 2 unhealthy containers, RM will currently recover one new replia, leaving all 4 nodes used with 2 healthy and 2 unhealthy. As unhealthy containers are only removed after all over and under replication has been resolved, the container will remain stuck like this.

      To avoid this, if there are insufficient spare nodes and also some unhealthy containers, then the under replication handler may need to call into the unhealthy handler to remove some of the unhealthy replicas to allow progress to be made.

      Attachments

        Issue Links

          Activity

            People

              sodonnell Stephen O'Donnell
              sodonnell Stephen O'Donnell
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: