Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
We have a cluster, some of its datanodes' disks are corrupt. But it tooks us a few days to be aware of the problem. Adding a metrics that keeps track of the number of reported corrupt replicas would allow us to have an alert when unusual number of corrupt replicas are reported.