Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.7.1
-
None
-
Reviewed
Description
I found the testcase TestBlockReplacement will be failed sometimes in testing. And I looked the unit log, always I will found these infos:
org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement testDeletedBlockWhenAddBlockIsInEdit(org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement) Time elapsed: 8.764 sec <<< FAILURE! java.lang.AssertionError: The block should be only on 1 datanode expected:<1> but was:<2> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement.testDeletedBlockWhenAddBlockIsInEdit(TestBlockReplacement.java:436)
Finally I found the reason is that not deleting block completely in testDeletedBlockWhenAddBlockIsInEdit cause the datanode's num not correct. And the time to wait FsDatasetAsyncDsikService to delete the block is not a accurate value.
LOG.info("replaceBlock: " + replaceBlock(block, (DatanodeInfo)sourceDnDesc, (DatanodeInfo)sourceDnDesc, (DatanodeInfo)destDnDesc)); // Waiting for the FsDatasetAsyncDsikService to delete the block Thread.sleep(3000);
When I adjust this time to 1 seconds, it will be always failed. Also the 3 seconds in test is not a accurate value too. We should adjust these code's logic to a better way such as waiting for the block to be replicated in testDecommision.