[HDDS-8062] Persist reason for container replica being marked unhealthy - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.0
Component/s: None
Labels:
- pull-request-available

Description

Once a container replica is marked unhealthy by the scanner, it would be helpful for debugging to persist why the container was marked unhealthy. Just logging to the main datanode log will eventually roll off and would require more filtering to figure out what happened.

Reasons for marking unhealthy include:

Corrupted block (and which block was corrupted)
Corrupted container metadata file
Volume failure

Some options for persisting the information are:

Into the .container file itself.
- May not work if the container file is corrupted.
To the datanode audit log
- Would get mixed up with client operations like put block.
To a different file within the container
- This could be used to track the entire lifecycle of the container, like when it was created, closed, replicated, and marked unhealthy.
To a dedicated log4j logger that can be configured to go to a different file.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

container_log_v1.pdf
05/Jun/23 15:34
85 kB
Ethan Rose

Issue Links

is fixed by

HDDS-9002 Container log should write to separate file

Resolved

links to

GitHub Pull Request #4995

Activity

People

Assignee:: Ethan Rose

Reporter:: Ethan Rose

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 01/Mar/23 23:37

Updated:: 27/Oct/23 17:52

Resolved:: 11/Jul/23 08:55