Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10803

TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails intermittently due to no free space available

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.1.0, 2.10.0, 2.9.2, 2.8.5
    • None
    • None
    • Reviewed

    Description

      The test TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails intermittently. The stack infos(https://builds.apache.org/job/PreCommit-HDFS-Build/16534/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithMultipleNameNodes/testBalancing2OutOf3Blockpools/):

      java.io.IOException: Creating block, no free space available
      	at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset$BInfo.<init>(SimulatedFSDataset.java:151)
      	at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.injectBlocks(SimulatedFSDataset.java:580)
      	at org.apache.hadoop.hdfs.MiniDFSCluster.injectBlocks(MiniDFSCluster.java:2679)
      	at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.unevenDistribution(TestBalancerWithMultipleNameNodes.java:405)
      	at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancing2OutOf3Blockpools(TestBalancerWithMultipleNameNodes.java:516)
      

      The error message means that the datanode's capacity has used up and there is no other space to create a new file block.

      I looked into the code, I found the main reason seemed that the capacities for cluster is not correctly constructed in the second cluster startup before preparing to redistribute blocks in test.
      The related code:

            // Here we do redistribute blocks nNameNodes times for each node,
            // we need to adjust the capacities. Otherwise it will cause the no 
            // free space errors sometimes.
            final MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf)
                .nnTopology(MiniDFSNNTopology.simpleFederatedTopology(nNameNodes))
                .numDataNodes(nDataNodes)
                .racks(racks)
                .simulatedCapacities(newCapacities)
                .format(false)
                .build();
            LOG.info("UNEVEN 11");
              ...
              for(int n = 0; n < nNameNodes; n++) {
                // redistribute blocks
                final Block[][] blocksDN = TestBalancer.distributeBlocks(
                    blocks[n], s.replication, distributionPerNN);
          
                for(int d = 0; d < blocksDN.length; d++)
                  cluster.injectBlocks(n, d, Arrays.asList(blocksDN[d]));
      
                LOG.info("UNEVEN 13: n=" + n);
              }
      

      And that means the totalUsed value has been increased as nNameNodes*usedSpacePerNN rather than usedSpacePerNN.

      Attachments

        1. HDFS-10803.001.patch
          2 kB
          Yiqun Lin

        Activity

          People

            linyiqun Yiqun Lin
            linyiqun Yiqun Lin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: