Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1885

Race condition in MiniDFSCluster shutdown

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.15.0
    • test
    • None

    Description

      Hudson has been sporadically failing tests that start- or follow tests that start- multiple datanodes in MiniDFSCluster, particularly on Solaris and Windows. The following appears to be at least partially responsible (much credit to Nigel for helping to discern this).

      A common error:

      java.io.IOException: Cannot remove data directory: /export/home/hudson/hudson/jobs/Hadoop-Nightly/workspace/trunk/build/test/data/dfs/data
      	at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:126)
      	at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:80)
      	at org.apache.hadoop.dfs.TestFsck.testFsckNonExistent(TestFsck.java:96)
      

      MiniDFSCluster starts multiple DataNodes by calling DataNode::createDataNode, which creates and starts a DataNode thread, assigns the instance to a static member, and returns the Runnable. Of course, each call from MiniDFSCluster overwrites this instance. Since DataNode::shutdown() calls join() on the same Thread, each subsequent join is essentially a noop after the first DataNode finishes. When MiniDFSCluster::shutdown() returns, it may not have released its resources, so the next MiniDFSCluster may fail to start.

      Attachments

        1. 1885.patch
          1 kB
          Christopher Douglas

        Activity

          People

            cdouglas Christopher Douglas
            cdouglas Christopher Douglas
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: