|
[
Permlink
| « Hide
]
Mark Butler added a comment - 01/Feb/08 03:59 PM
This test case demonstrates the problem with last nodes on the rack at the getLeaf level.
This amended test case demonstrates that NetworkTopology.chooseRandom(~node) will not return the last node on the rack.
This is called by org.apache.hadoop.dfs.ReplicationTargetChooser#chooseRandom(int, String, List<Node>), which in turn is called by chooseRandom(int, String, List<Node>, long, int, List<DatanodeDescriptor> which are called by chooseRemoteRack(int, DatanodeDescriptor, List<Node>, long, int, List<DatanodeDescriptor>) chooseTarget is an important method for placing replicas. Further examination shows the problem only occurs when you are excluding a node, not when you are excluding a rack. So it may not be a problem for DFS ...
However, if it does occur, it is not just a problem of not using a node, it also causes another node to be a "hotspot" as it is twice as likely as any other node to get a replica placement., which is clearly undesirable .. Can someone more familiar with DFS replication code comment if this is likely to be a problem? Mark, thank you for your time investigating this issue. Yes, this is a bug, wich occurs when the excluded node is a leaf. Fortunately dfs block allocation strategy only excludes racks. So the bug does not have an effect on the dfs block allocation. Could you please check if the attached patch fixed the problem?
Hairong, thanks for the speedy reply.
As you note, looking at the code, DFS does not call this to exclude nodes. So it is not a problem for DFS. Specifically excluding racks happens in NetworkTopology.chooseRandom(String) while excluding nodes occurs in ReplicationTargetChooser.chooseRandom(int numReplicas, String nodes, List<Node> excludedNodes). However you have written the patch, so my suggestion is we fix this. I took a look at the patch and there is a small error in it - it said if (excludedNode instanceof InnerNode) { numOfExcludedLeaves = ((InnerNode)excludedNode).getNumOfLeaves(); isLeaf = true; } but if it is an inner node, it can't be a leaf? Consequently it fails the test so I think it should be if (excludedNode instanceof InnerNode) { numOfExcludedLeaves = ((InnerNode)excludedNode).getNumOfLeaves(); } else { isLeaf = true; } I've produced an updated patch and also modified my test code and incorporated it into TestNetworkTopology.java. This should resolve the issue. Thanks again! This is a modified patch file, it fixes a problem with the previous patch and adds unit tests for this change.
Mark, thank you for providing junit tests for the patch. They look good. The patch that I am uploading have two minor changes.
excludedLeaf3.patch: +1
codes look good, I suggest the following change: ...
int numOfExcludedLeaves = 1;
boolean isLeaf = !(excludedNode instanceof InnerNode);
if (!isLeaf) {
numOfExcludedLeaves = ((InnerNode)excludedNode).getNumOfLeaves();
}
Incorporated Nicholas's comment.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12374834/excludedLeaf4.patch against trunk revision 616796. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1747/testReport/ This message is automatically generated. I just committed this. Thanks Mark and Hairong!
Integrated in Hadoop-trunk #394 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/394/
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||