Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4275

MiniDFSCluster-based tests fail on Windows due to failure to delete test namenode directory

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: trunk-win, 3.0.0-alpha1
    • Fix Version/s: 3.0.0-alpha1
    • Component/s: test
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Multiple HDFS test suites fail on Windows during initialization of MiniDFSCluster due to "Could not fully delete" the name testing data directory.

      1. HDFS-4275.1.patch
        2 kB
        Chris Nauroth

        Issue Links

          Activity

          Hide
          cnauroth Chris Nauroth added a comment -

          Resolution of this issue depends on HDFS-4274. (Fixing the problem in HDFS-4274 then reveals that many of the tests fail later due to the HDFS-4275 problem.)

          Show
          cnauroth Chris Nauroth added a comment - Resolution of this issue depends on HDFS-4274 . (Fixing the problem in HDFS-4274 then reveals that many of the tests fail later due to the HDFS-4275 problem.)
          Hide
          cnauroth Chris Nauroth added a comment -

          To put a number on that, I see 163 test suites that currently exhibit this problem on Windows.

          Show
          cnauroth Chris Nauroth added a comment - To put a number on that, I see 163 test suites that currently exhibit this problem on Windows.
          Hide
          cnauroth Chris Nauroth added a comment -

          Additionally, there are a few more test suites failing with similar but different error messages. There are 5 failing test suites due to "Cannot remove current directory", and 4 failing test suites due to "Could not delete hdfs directory". It's possible that these have different causes, but I'll track them all here for now and split out separate issues later if we identify different causes.

          Show
          cnauroth Chris Nauroth added a comment - Additionally, there are a few more test suites failing with similar but different error messages. There are 5 failing test suites due to "Cannot remove current directory", and 4 failing test suites due to "Could not delete hdfs directory". It's possible that these have different causes, but I'll track them all here for now and split out separate issues later if we identify different causes.
          Hide
          cnauroth Chris Nauroth added a comment -

          One more variant on the error message: "Cannot remove directory". There are 2 test suite failures for that one.

          Show
          cnauroth Chris Nauroth added a comment - One more variant on the error message: "Cannot remove directory". There are 2 test suite failures for that one.
          Hide
          cnauroth Chris Nauroth added a comment -

          This is partially caused by the test timeout in TestBalancerWithNodeGroup documented in HDFS-4261. After a test timeout, a MiniDFSCluster is left running. On Windows, it will hold open file handles on the test data directory, preventing subsequent tests from starting their own MiniDFSCluster. This means that a test timeout on Windows causes a failure not only for that test, but also for all subsequent MiniDFSCluster-based tests.

          Applying the HDFS-4261 patch moved the Windows test run forward, but then it hit a similar situation with TestBlockRecovery. This one appears to be Windows-specific.

          Show
          cnauroth Chris Nauroth added a comment - This is partially caused by the test timeout in TestBalancerWithNodeGroup documented in HDFS-4261 . After a test timeout, a MiniDFSCluster is left running. On Windows, it will hold open file handles on the test data directory, preventing subsequent tests from starting their own MiniDFSCluster . This means that a test timeout on Windows causes a failure not only for that test, but also for all subsequent MiniDFSCluster -based tests. Applying the HDFS-4261 patch moved the Windows test run forward, but then it hit a similar situation with TestBlockRecovery . This one appears to be Windows-specific.
          Hide
          cnauroth Chris Nauroth added a comment -

          I am attaching a patch to resolve the remaining issues after resolution of HDFS-4261.

          The only remaining issue is that TestBlockRecovery#testRaceBetweenReplicaRecoveryAndFinalizeBlock calls tearDown twice: once explicitly at the start of the test and once implicitly by JUnit after the test finishes. On Windows, the double tear-down is problematic. The second delete fails, causing the test to fail. I've added a flag to track whether or not tearDown has already executed for a test and prevent double execution.

          Show
          cnauroth Chris Nauroth added a comment - I am attaching a patch to resolve the remaining issues after resolution of HDFS-4261 . The only remaining issue is that TestBlockRecovery#testRaceBetweenReplicaRecoveryAndFinalizeBlock calls tearDown twice: once explicitly at the start of the test and once implicitly by JUnit after the test finishes. On Windows, the double tear-down is problematic. The second delete fails, causing the test to fail. I've added a flag to track whether or not tearDown has already executed for a test and prevent double execution.
          Hide
          cnauroth Chris Nauroth added a comment -

          Adding 3.0.0 to Affects Version/s and Target Version/s. This patch can commit to trunk and then merge to branch-trunk-win.

          Show
          cnauroth Chris Nauroth added a comment - Adding 3.0.0 to Affects Version/s and Target Version/s. This patch can commit to trunk and then merge to branch-trunk-win.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12559950/HDFS-4275.1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3622//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3622//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12559950/HDFS-4275.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3622//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3622//console This message is automatically generated.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          +1

          Just out of curiosity, why is the second tear down problematic on Windows?

          Show
          arpitagarwal Arpit Agarwal added a comment - +1 Just out of curiosity, why is the second tear down problematic on Windows?
          Hide
          xgong Xuan Gong added a comment -

          +1 Looks good

          Show
          xgong Xuan Gong added a comment - +1 Looks good
          Hide
          cnauroth Chris Nauroth added a comment -

          Thank you, Arpit and Xuan.

          The double tearDown is problematic, because it causes a race condition on trying to delete an open directory, ultimately failing the assertion in tearDown with "Cannot delete data-node dirs". This fails specifically on Windows because of its file locking policy.

          The file in question is the same verification log mentioned in HDFS-4274. The patch I uploaded there takes care of closing the file, but it happens in a separate thread started by DataBlockScanner. This is asynchronous of the main JUnit thread, so we don't know deterministically when the file will be closed.

          For other tests, this has not been a problem. Presumably, that is because other tests consistently use MiniDFSCluster to manage this. TestBlockRecovery is a bit different in that most of its tests start a DataNode directly, but then testRaceBetweenReplicaRecoveryAndFinalizeBlock immediately shuts it down and uses a MiniDFSCluster instead.

          Show
          cnauroth Chris Nauroth added a comment - Thank you, Arpit and Xuan. The double tearDown is problematic, because it causes a race condition on trying to delete an open directory, ultimately failing the assertion in tearDown with "Cannot delete data-node dirs". This fails specifically on Windows because of its file locking policy. The file in question is the same verification log mentioned in HDFS-4274 . The patch I uploaded there takes care of closing the file, but it happens in a separate thread started by DataBlockScanner . This is asynchronous of the main JUnit thread, so we don't know deterministically when the file will be closed. For other tests, this has not been a problem. Presumably, that is because other tests consistently use MiniDFSCluster to manage this. TestBlockRecovery is a bit different in that most of its tests start a DataNode directly, but then testRaceBetweenReplicaRecoveryAndFinalizeBlock immediately shuts it down and uses a MiniDFSCluster instead.
          Hide
          sureshms Suresh Srinivas added a comment -

          Chris, thanks for describing the issue. Avoiding teardown twice makes senses. +1 for the patch.

          Show
          sureshms Suresh Srinivas added a comment - Chris, thanks for describing the issue. Avoiding teardown twice makes senses. +1 for the patch.
          Hide
          sureshms Suresh Srinivas added a comment -

          I committed the patch to trunk.

          Thank you Chris. Thank you Apri and Xuan for the review.

          Show
          sureshms Suresh Srinivas added a comment - I committed the patch to trunk. Thank you Chris. Thank you Apri and Xuan for the review.
          Hide
          hudson Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3125 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3125/)
          HDFS-4275. MiniDFSCluster-based tests fail on Windows due to failure to delete test namenode directory. Contributed by Chris Nauroth. (Revision 1422278)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1422278
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
          Show
          hudson Hudson added a comment - Integrated in Hadoop-trunk-Commit #3125 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3125/ ) HDFS-4275 . MiniDFSCluster-based tests fail on Windows due to failure to delete test namenode directory. Contributed by Chris Nauroth. (Revision 1422278) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1422278 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
          Hide
          hudson Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #67 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/67/)
          HDFS-4275. MiniDFSCluster-based tests fail on Windows due to failure to delete test namenode directory. Contributed by Chris Nauroth. (Revision 1422278)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1422278
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
          Show
          hudson Hudson added a comment - Integrated in Hadoop-Yarn-trunk #67 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/67/ ) HDFS-4275 . MiniDFSCluster-based tests fail on Windows due to failure to delete test namenode directory. Contributed by Chris Nauroth. (Revision 1422278) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1422278 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
          Hide
          hudson Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1256 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1256/)
          HDFS-4275. MiniDFSCluster-based tests fail on Windows due to failure to delete test namenode directory. Contributed by Chris Nauroth. (Revision 1422278)

          Result = FAILURE
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1422278
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
          Show
          hudson Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1256 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1256/ ) HDFS-4275 . MiniDFSCluster-based tests fail on Windows due to failure to delete test namenode directory. Contributed by Chris Nauroth. (Revision 1422278) Result = FAILURE suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1422278 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
          Hide
          hudson Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1287 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1287/)
          HDFS-4275. MiniDFSCluster-based tests fail on Windows due to failure to delete test namenode directory. Contributed by Chris Nauroth. (Revision 1422278)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1422278
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java
          Show
          hudson Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1287 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1287/ ) HDFS-4275 . MiniDFSCluster-based tests fail on Windows due to failure to delete test namenode directory. Contributed by Chris Nauroth. (Revision 1422278) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1422278 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java

            People

            • Assignee:
              cnauroth Chris Nauroth
              Reporter:
              cnauroth Chris Nauroth
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development