Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9599

TestDecommissioningStatus.testDecommissionStatus occasionally fails

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.8.0, 3.0.0-alpha1
    • namenode
    • None
    • Jenkins

    • Reviewed

    Description

      From test result of a recent jenkins nightly https://builds.apache.org/job/Hadoop-Hdfs-trunk/2663/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestDecommissioningStatus/testDecommissionStatus/

      The test failed because the number of under replicated blocks is 4, instead of 3.

      Looking at the log, there is a strayed block, which might have caused the faillure:

      2015-12-23 00:42:05,820 [Block report processor] INFO  BlockStateChange (BlockManager.java:processReport(2131)) - BLOCK* processReport: blk_1073741825_1001 on node 127.0.0.1:57382 size 16384 does not belong to any file
      

      The block size 16384 suggests this is left over from the sibling test case testDecommissionStatusAfterDNRestart. This can happen, because the same minidfs cluster is reused between tests.

      The test implementation should do a better job isolating tests.

      Another case of failure is when the load factor comes into play, and a block can not find sufficient data nodes to place replica. In this test, the runtime should not consider load factor:

      conf.setBoolean(DFSConfigKeys.DFS_NAMENODE_REPLICATION_CONSIDERLOAD_KEY, false);
      

      Attachments

        1. HDFS-9599.001.patch
          3 kB
          Yiqun Lin
        2. HDFS-9599.002.patch
          2 kB
          Yiqun Lin

        Issue Links

          Activity

            People

              linyiqun Yiqun Lin
              weichiu Wei-Chiu Chuang
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: