Description
Hudson has been sporadically failing tests that start- or follow tests that start- multiple datanodes in MiniDFSCluster, particularly on Solaris and Windows. The following appears to be at least partially responsible (much credit to Nigel for helping to discern this).
A common error:
java.io.IOException: Cannot remove data directory: /export/home/hudson/hudson/jobs/Hadoop-Nightly/workspace/trunk/build/test/data/dfs/data at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:126) at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:80) at org.apache.hadoop.dfs.TestFsck.testFsckNonExistent(TestFsck.java:96)
MiniDFSCluster starts multiple DataNodes by calling DataNode::createDataNode, which creates and starts a DataNode thread, assigns the instance to a static member, and returns the Runnable. Of course, each call from MiniDFSCluster overwrites this instance. Since DataNode::shutdown() calls join() on the same Thread, each subsequent join is essentially a noop after the first DataNode finishes. When MiniDFSCluster::shutdown() returns, it may not have released its resources, so the next MiniDFSCluster may fail to start.