Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-7364 Improved container scanning
  3. HDDS-8062

Persist reason for container replica being marked unhealthy

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0
    • None

    Description

      Once a container replica is marked unhealthy by the scanner, it would be helpful for debugging to persist why the container was marked unhealthy. Just logging to the main datanode log will eventually roll off and would require more filtering to figure out what happened.

      Reasons for marking unhealthy include:

      • Corrupted block (and which block was corrupted)
      • Corrupted container metadata file
      • Volume failure

      Some options for persisting the information are:

      • Into the .container file itself.
        • May not work if the container file is corrupted.
      • To the datanode audit log
        • Would get mixed up with client operations like put block.
      • To a different file within the container
        • This could be used to track the entire lifecycle of the container, like when it was created, closed, replicated, and marked unhealthy.
      • To a dedicated log4j logger that can be configured to go to a different file.

      Attachments

        1. container_log_v1.pdf
          85 kB
          Ethan Rose

        Issue Links

          Activity

            People

              erose Ethan Rose
              erose Ethan Rose
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: