Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-7364 Improved container scanning
  3. HDDS-7097

Container scanner log output lacks useful information

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0
    • None

    Description

      Currently the output from the container scanner may look like this

      2022-08-04 14:16:37,702 WARN org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer: Moving container /hadoop-ozone/datanode/data/hdds/CID-5612c780-06f8-4ac5-9eae-498159abd009/current/containerDir1/1008 to state UNHEALTHY from state:UNHEALTHY Trace:java.base/java.lang.Thread.getStackTrace(Thread.java:1606)
      org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058)
      org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.markContainerUnhealthy(KeyValueContainer.java:335)
      org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.markContainerUnhealthy(KeyValueHandler.java:1017)
      org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.markContainerUnhealthy(ContainerController.java:116)
      org.apache.hadoop.ozone.container.ozoneimpl.ContainerDataScanner.runIteration(ContainerDataScanner.java:108)
      org.apache.hadoop.ozone.container.ozoneimpl.ContainerDataScanner.run(ContainerDataScanner.java:81)
      ...
      2022-08-04 14:30:19,407 ERROR org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerCheck: Corruption detected in container: [2] Exception: [null]
      

      There's numerous problems with this:

      • The previous container state is not logged. The new unhealthy state is incorrectly logged as the previous state.
      • The exception identifying the corruption only has its message printed. The exception object itself should be logged to better identify the failure and catch cases like above where there is no exception message (probably caused by a bug).
      • The stack trace of the call to KeyValueContainer#markContainerUnhealthy is logged, which both verbose and not useful.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dteng Dave Teng
            erose Ethan Rose
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment