Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Currently the output from the container scanner may look like this
2022-08-04 14:16:37,702 WARN org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer: Moving container /hadoop-ozone/datanode/data/hdds/CID-5612c780-06f8-4ac5-9eae-498159abd009/current/containerDir1/1008 to state UNHEALTHY from state:UNHEALTHY Trace:java.base/java.lang.Thread.getStackTrace(Thread.java:1606) org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1058) org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.markContainerUnhealthy(KeyValueContainer.java:335) org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.markContainerUnhealthy(KeyValueHandler.java:1017) org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.markContainerUnhealthy(ContainerController.java:116) org.apache.hadoop.ozone.container.ozoneimpl.ContainerDataScanner.runIteration(ContainerDataScanner.java:108) org.apache.hadoop.ozone.container.ozoneimpl.ContainerDataScanner.run(ContainerDataScanner.java:81) ... 2022-08-04 14:30:19,407 ERROR org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerCheck: Corruption detected in container: [2] Exception: [null]
There's numerous problems with this:
- The previous container state is not logged. The new unhealthy state is incorrectly logged as the previous state.
- The exception identifying the corruption only has its message printed. The exception object itself should be logged to better identify the failure and catch cases like above where there is no exception message (probably caused by a bug).
- The stack trace of the call to KeyValueContainer#markContainerUnhealthy is logged, which both verbose and not useful.
Attachments
Issue Links
- relates to
-
HDDS-7413 Fix logging while marking container state unhealthy
- Resolved
- links to