Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
0.20.0
-
None
-
Reviewed
Description
The "Replica Placement: The First Baby Steps" section of HDFS architecture document states:
"...
For the common case, when the replication factor is three, HDFS's placement policy is to put one replica on one node in the local rack, another on a different node in the local rack, and the last on a different node in a different rack. This policy cuts the inter-rack write traffic which generally improves write performance.
..."
However, according to the ReplicationTargetChooser.chooseTarger()'s code the actual logic is to put the second replica on a different rack as well as the third replica. So you have two replicas located on a different nodes of remote rack and one (initial replica) on the local rack's node. Thus, the sentence should say something like this:
"For the common case, when the replication factor is three, HDFS's placement policy is to put one replica on one node in the local rack, another on a node in a different (remote) rack, and the last on a different node in the same remote rack. This policy cuts the inter-rack write traffic which generally improves write performance."