Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-8699 Further Replication Manager Improvements
  3. HDDS-9595

Investigate QUASI_CLOSED containers with only one UNHEALTHY and empty replica

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • SCM
    • None

    Description

      We've seen some QUASI_CLOSED containers that have only 1 replica, and that 1 replica is UNHEALTHY and also empty. We need to investigate how the system ends up having such containers, and what to do about them. The solution will likely depend on whether the container is known by SCM to have zero keys. In general, this ties up into the larger problem of the SCM not knowing which containers that appear empty are actually not empty but have missing data, because it doesn't know if there are keys mapped to this container.

      Currently, this is our logic for calling a container empty:

          return container.getState() == HddsProtos.LifeCycleState.CLOSED &&
              !replicas.isEmpty() &&
              replicas.stream().allMatch(
                  r -> r.getState() == ContainerReplicaProto.State.CLOSED &&
                      r.isEmpty());
      
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              siddhant Siddhant Sangwan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: