|
[
Permlink
| « Hide
]
dhruba borthakur added a comment - 02/Nov/08 07:20 AM
FileSystem.getFileBlockLocation also returns the rack location of the blocks that are returned.
Review comments welcome.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12393204/dfsRackLocation.patch against trunk revision 709609. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3519/testReport/ This message is automatically generated. This looks good. The only debatable point is whether to introduce a new 'racks' variable in BlockLocations or to just prefix the network topology in the hosts variable itself. Having a separate racks variable implies that there is an implicit requirement that the the ordering of hosts and the racks are identical (which is true in this case). However, having a separate racks variable does makes it easier to handle cases where there is no topology information available, just a simple check on the racks variable would do instead of adding logic during the parsing of host names. I am fine with either approach.
Thanks for the review comments. I made it a separate variable so that the API clearly identifies the hostname and the network topology.
>Having a separate racks variable implies that there is an implicit requirement that the the ordering of hosts and the racks are identical (which is true in this case I missed your point. Can you pl explain it in greater detal? Thanks. Sorry, let me try to explain it better. Let us assume that the replication factor is 3 and we have one block. GetFileBlockLocation would return in BlockLocations[0].hosts {h1, h2, h3} and in BlockLocations[0].racks {r1,r1,r2} where h1 and h2 are in r1 and h3 is in r2. The topologyPaths for the different hosts are determined by their relative positions in their arrays. The topologyPath for h1, which is in index 0 of hosts array, is r1 (index 0 in the racks array). An alternative could have just been to return /r1/h1, /r1/h2 and /r2/h3 in the hosts. That way, if somebody wants both the host and rack information, they do not need to construct it by reading the same index in two arrays. Makes sense?
Got it. Actually the queston you are raising is whether BlockLocation.getTopologyPaths() should return /r1 or should it return /r1/h1. I would prefer that this new method return /r1/h1. If somebody wants only the hostname, they can continue to use the existing method BlockLocation.getHostName().
BlockLocation.getTopologyPath() returns the full network location of each of the hosts. Th last component of each path is the host:port.
GetTopologyPaths returning /r1/h1 looks good to me as well.
Patch looks good. Is the
System.out.println("XXX" + topologyPaths[j]); Thanks for catching the debug print. I have been making this mistake too often in the last few days
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12393321/dfsRackLocation3.patch against trunk revision 711350. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3528/testReport/ This message is automatically generated. GetFileBlockLocations returns the NetworkTopology information of the machines on where the blocks reside.
Edit release note for publication.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||