Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Done
-
None
-
None
-
None
Description
This Jira tracks a few improvements to DataNode’s usage of DiskChecker to address the following problems:
- Checks are serialized so a single slow disk can indefinitely delay checking the rest.
- Related to 1, no detection of stalled checks.
- Lack of granularity. A single IO error initiates checking all disks.
- Inconsistent activation. Some DataNode IO failures trigger disk checks but not all.