Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
On a large cluster, a few datanodes could be under-performing. There were cases when the network connectivity of a few of these bad datanodes were degraded, resulting in long long times (in the order of two hours) to transfer blocks to and from these datanodes.
A similar issue arises when disks a single disk on a datanode fail or change to read-only mode: in this case the entire datanode shuts down.
HDFS should detect and handle network and disk performance degradation more gracefully. One option would be to blacklist these datanodes, de-prioritise their use and alert the administrator.
Attachments
Issue Links
- duplicates
-
HADOOP-2830 Need to instrument Hadoop to get comprehensive network traffic metrics
- Resolved
-
HDFS-324 Generate a network infrastructre map
- Resolved
- relates to
-
HDFS-97 DFS should detect slow links(nodes) and avoid them
- Resolved