Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-6344

Better handling of incorrect but recoverable containers

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Ozone Datanode, SCM
    • None

    Description

      Currently, if there is an issue with a container, we must choose whether to mark it as unhealthy or not. Once a container is marked unhealthy, it can never leave this state and will not be replicated. If we never have bugs with containers, this binary healthy/unhealthy state would probably suffice. However, as we saw with HDDS-6235, containers may end up in a bad state due to bugs which are recoverable with manual intervention. It would be useful for SCM to still replicate and maintain these containers and log the issues it finds so that we can identify and fix issues faster with less risk of data loss.

      With this feature we could make the container scrubber less aggressive and enable it by default to find bugs. We could also use it to flag non-fatal issues on container import/export.

      cc hanishakoneru avijayan 

      Attachments

        Activity

          People

            Unassigned Unassigned
            erose Ethan Rose
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: