Details
Description
The test TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails intermittently. The stack infos(https://builds.apache.org/job/PreCommit-HDFS-Build/16534/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithMultipleNameNodes/testBalancing2OutOf3Blockpools/):
java.io.IOException: Creating block, no free space available at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset$BInfo.<init>(SimulatedFSDataset.java:151) at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.injectBlocks(SimulatedFSDataset.java:580) at org.apache.hadoop.hdfs.MiniDFSCluster.injectBlocks(MiniDFSCluster.java:2679) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.unevenDistribution(TestBalancerWithMultipleNameNodes.java:405) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancing2OutOf3Blockpools(TestBalancerWithMultipleNameNodes.java:516)
The error message means that the datanode's capacity has used up and there is no other space to create a new file block.
I looked into the code, I found the main reason seemed that the capacities for cluster is not correctly constructed in the second cluster startup before preparing to redistribute blocks in test.
The related code:
// Here we do redistribute blocks nNameNodes times for each node, // we need to adjust the capacities. Otherwise it will cause the no // free space errors sometimes. final MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf) .nnTopology(MiniDFSNNTopology.simpleFederatedTopology(nNameNodes)) .numDataNodes(nDataNodes) .racks(racks) .simulatedCapacities(newCapacities) .format(false) .build(); LOG.info("UNEVEN 11"); ... for(int n = 0; n < nNameNodes; n++) { // redistribute blocks final Block[][] blocksDN = TestBalancer.distributeBlocks( blocks[n], s.replication, distributionPerNN); for(int d = 0; d < blocksDN.length; d++) cluster.injectBlocks(n, d, Arrays.asList(blocksDN[d])); LOG.info("UNEVEN 13: n=" + n); }
And that means the totalUsed value has been increased as nNameNodes*usedSpacePerNN rather than usedSpacePerNN.