Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
-
Hadoop 80 node cluster
Description
I did a simple experiment of shooting down one node in the cluster and measure the time taken to replicate the under-replicated blocks.
~30000 blocks were under replicated == ~400 / node should take 200 minutes to replicate completely given 1 minute heartbeat interval.
My findings: it took around 220 minutes, which is reasonable.
Bug: Replication is coupled with heartbeat. Heartbeat interval is based on how much a namenode can handle. Repliaction should be based on how much a datanode can handle.
So given the default heartbeat interval of 20 seconds, we computed datanodes can handle 2 replications in that interval based on which Namenodes give 2 blocks per heartbeat to replicate.
What we propose is to keep the 20second/2blocks constant and hence a datanode coming in with a heartbeat of 1 minute interval should be given 6 blocks to replicate per heartbeat. In this case instead on taking 200 minutes it should take 200/3 ~1 hour to replicate the entire node.
Attachments
Issue Links
- is duplicated by
-
HADOOP-2606 Namenode unstable when replicating 500k blocks at once
- Closed
- is related to
-
HADOOP-2606 Namenode unstable when replicating 500k blocks at once
- Closed