We have seen cases where a mini dfs cluster startup fails due to not being able to delete the data_dir in initMiniDFSCluster(). Depending on when the build machine gets busy, it hits random test cases. If we make it sleep few seconds and try again, it works most of times. The surefire doc says,
After the test-set has completed, the process executes java.lang.System.exit(0) which starts shutdown hooks. At this point the process may run next 30 seconds until all non daemon Threads die. After the period of time has elapsed, the process kills itself by java.lang.Runtime.halt(0).
MiniDFSCluster#shutdown() registers base_dir to be deleted on shutdown. If this gets slow, the next test JVM will start to run before the shutdown hook completes. But forcing every test to call shutdown(true) can slowdown things. Instead, each instance should get a random base_dir, so that the deletion through shutdown hook and the subsequent new test setup can overlap.
Steve Loughran mentioned this in
many buildups of test dirs now use something random, rather than a hard-coded path like "dfs". This includes minidfs cluster...which should improve parallelism on test runs.
Can we actually make sure each MiniDFSCluster gets a unique base directory?