Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-8699 Further Replication Manager Improvements
  3. HDDS-9595

Investigate QUASI_CLOSED containers with only one UNHEALTHY and empty replica

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersConvert to IssueLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • SCM
    • None

    Description

      We've seen some QUASI_CLOSED containers that have only 1 replica, and that 1 replica is UNHEALTHY and also empty. We need to investigate how the system ends up having such containers, and what to do about them. The solution will likely depend on whether the container is known by SCM to have zero keys. In general, this ties up into the larger problem of the SCM not knowing which containers that appear empty are actually not empty but have missing data, because it doesn't know if there are keys mapped to this container.

      Currently, this is our logic for calling a container empty:

          return container.getState() == HddsProtos.LifeCycleState.CLOSED &&
              !replicas.isEmpty() &&
              replicas.stream().allMatch(
                  r -> r.getState() == ContainerReplicaProto.State.CLOSED &&
                      r.isEmpty());
      
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            siddhant Siddhant Sangwan

            Dates

              Created:
              Updated:

              Slack

                Issue deployment