Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    1. hdfs-1884.1.patch
      2 kB
      Aaron T. Myers
    2. hdfs-1884.0.patch
      2 kB
      Aaron T. Myers

      Issue Links

        Activity

        Hide
        Matt Foley added a comment -

        The following test has failed on all or most PreCommit-HDFS-Build builds since #427, but not before that. #427 was the build for HDFS-1052.

        org.apache.hadoop.hdfs.TestDFSStorageStateRecovery.testBlockPoolStorageStates 427-446

        Show
        Matt Foley added a comment - The following test has failed on all or most PreCommit-HDFS-Build builds since #427, but not before that. #427 was the build for HDFS-1052 . org.apache.hadoop.hdfs.TestDFSStorageStateRecovery.testBlockPoolStorageStates 427-446
        Hide
        Todd Lipcon added a comment -

        Anyone working on this? It's a shame our build has been broken for a full 3 weeks since federation was merged.

        Show
        Todd Lipcon added a comment - Anyone working on this? It's a shame our build has been broken for a full 3 weeks since federation was merged.
        Hide
        Aaron T. Myers added a comment -

        @Todd: I'll take a look at it.

        Show
        Aaron T. Myers added a comment - @Todd: I'll take a look at it.
        Hide
        Aaron T. Myers added a comment -

        I took a look into this. Turns out the failure is due to running out of available FDs in the test process. Each of these three test cases leak FDs, but the two DN-related tests leak more than the NN test. If the two DN tests are both run, the one run second will fail at some random point when it runs out of available FDs.

        The other things that made this tough to track down is that the exception which triggered the failure was some times being masked by two unnecessary "catch (Exception e)

        { // ignore }

        " blocks, and a spurious NPE. I'll be attaching a patch shortly which addresses these issues.

        I tried running this test with the patch posted in HADOOP-7146 and can confirm that it passes reliably, and is no longer leaking FDs test-to-test.

        Show
        Aaron T. Myers added a comment - I took a look into this. Turns out the failure is due to running out of available FDs in the test process. Each of these three test cases leak FDs, but the two DN-related tests leak more than the NN test. If the two DN tests are both run, the one run second will fail at some random point when it runs out of available FDs. The other things that made this tough to track down is that the exception which triggered the failure was some times being masked by two unnecessary "catch (Exception e) { // ignore } " blocks, and a spurious NPE. I'll be attaching a patch shortly which addresses these issues. I tried running this test with the patch posted in HADOOP-7146 and can confirm that it passes reliably, and is no longer leaking FDs test-to-test.
        Hide
        Aaron T. Myers added a comment -

        Patch which fixes up the test a little bit to make it easier to debug in the future.

        Show
        Aaron T. Myers added a comment - Patch which fixes up the test a little bit to make it easier to debug in the future.
        Hide
        Todd Lipcon added a comment -

        So cluster.startDataNodes isn't throwing even though the test seems to claim it should be?

        Show
        Todd Lipcon added a comment - So cluster.startDataNodes isn't throwing even though the test seems to claim it should be?
        Hide
        Aaron T. Myers added a comment -

        It is not. Note that the test implies both that it should and should not be. The original version has assert(...)}}s right after the statement that should theoretically be throwing, which wouldn't be reached if it were in fact throwing. Perhaps it should in fact be throwing, but in that case we should be putting {{fail(...)}}s right after the {{cluster.startDataNodes(...) calls, and the asserts in the catch(...) clauses.

        Show
        Aaron T. Myers added a comment - It is not. Note that the test implies both that it should and should not be. The original version has assert(...)}}s right after the statement that should theoretically be throwing, which wouldn't be reached if it were in fact throwing. Perhaps it should in fact be throwing, but in that case we should be putting {{fail(...)}}s right after the {{cluster.startDataNodes(...) calls, and the asserts in the catch(...) clauses.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12479960/hdfs-1884.0.patch
        against trunk revision 1128009.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/643//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479960/hdfs-1884.0.patch against trunk revision 1128009. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/643//console This message is automatically generated.
        Hide
        Aaron T. Myers added a comment -

        Rebased patch against trunk.

        Show
        Aaron T. Myers added a comment - Rebased patch against trunk.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12480672/hdfs-1884.1.patch
        against trunk revision 1128393.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/648//testReport/
        Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/648//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/648//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12480672/hdfs-1884.1.patch against trunk revision 1128393. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/648//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/648//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/648//console This message is automatically generated.
        Hide
        Aaron T. Myers added a comment -

        The only test which failed was TestFiDataTransferProtocol2. I just ran this test locally on my box and it passed just fine. I'm betting the failure was unrelated.

        Show
        Aaron T. Myers added a comment - The only test which failed was TestFiDataTransferProtocol2 . I just ran this test locally on my box and it passed just fine. I'm betting the failure was unrelated.
        Hide
        Todd Lipcon added a comment -

        +1. Committed to trunk. Thanks, Aaron.

        Show
        Todd Lipcon added a comment - +1. Committed to trunk. Thanks, Aaron.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #696 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/696/)
        HDFS-1884. Improve TestDFSStorageStateRecovery to properly throw in the case of errors. Contributed by Aaron T. Myers.

        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128987
        Files :

        • /hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/TestDFSStorageStateRecovery.java
        • /hadoop/hdfs/trunk/CHANGES.txt
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #696 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/696/ ) HDFS-1884 . Improve TestDFSStorageStateRecovery to properly throw in the case of errors. Contributed by Aaron T. Myers. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128987 Files : /hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/TestDFSStorageStateRecovery.java /hadoop/hdfs/trunk/CHANGES.txt
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #682 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/682/)
        HDFS-1884. Improve TestDFSStorageStateRecovery to properly throw in the case of errors. Contributed by Aaron T. Myers.

        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128987
        Files :

        • /hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/TestDFSStorageStateRecovery.java
        • /hadoop/hdfs/trunk/CHANGES.txt
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #682 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/682/ ) HDFS-1884 . Improve TestDFSStorageStateRecovery to properly throw in the case of errors. Contributed by Aaron T. Myers. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128987 Files : /hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/TestDFSStorageStateRecovery.java /hadoop/hdfs/trunk/CHANGES.txt

          People

          • Assignee:
            Aaron T. Myers
            Reporter:
            Matt Foley
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development