Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1808

TestBalancer waits forever, errs without giving information

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.23.0
    • Component/s: datanode, namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In three locations in the code, TestBalancer waits forever on a condition. Failures result in Hudson/Jenkins "Timeout occurred" error message with no information about where or why. Need to replace with TimeoutExceptions that throw a stack trace and useful info about the failure mode.

      In waitForHeartBeat(), it is waiting on an exact match for a value that may be coarsely quantized – i.e., significant deviation from the exact "expected" result may occur. Replace with an allowed range of result.

      1. 1808_TestBalancer_v4.patch
        8 kB
        Matt Foley
      2. 1808_TestBalancer_v3.patch
        9 kB
        Matt Foley
      3. TestBalancer.java.patch
        8 kB
        Matt Foley

        Activity

        Hide
        Matt Foley added a comment -

        Replace the three infinite waits with 20-second timeouts. Throw TimeoutException WITH useful information as to current state. Now if it errs in the future, at least we'll be able to see why.

        Replace the exact expected value with a bounded range (which can be set to "exact" if you really want that).

        Corrected a large number of comments that were out of date and incorrect.

        Test now passes Hudson automated testing in connection with HDFS-1295, where it previously failed.

        Posted here for information, but I'm going to subordinate this bug to HDFS-1295 (which it was blocking) and submit a single patch to that Jira.

        Show
        Matt Foley added a comment - Replace the three infinite waits with 20-second timeouts. Throw TimeoutException WITH useful information as to current state. Now if it errs in the future, at least we'll be able to see why. Replace the exact expected value with a bounded range (which can be set to "exact" if you really want that). Corrected a large number of comments that were out of date and incorrect. Test now passes Hudson automated testing in connection with HDFS-1295 , where it previously failed. Posted here for information, but I'm going to subordinate this bug to HDFS-1295 (which it was blocking) and submit a single patch to that Jira.
        Hide
        Matt Foley added a comment -

        Cleaned up a couple more things. Please review.

        Show
        Matt Foley added a comment - Cleaned up a couple more things. Please review.
        Hide
        Eli Collins added a comment -

        Change looks good. If you don't mind let's commit this patch independently so we track different changes in different jiras. Minor comments:

        Nits:

        • Rather than pass timeout and allowedRange to waitForHeartBeat I'd have them be constants in the test function, seems like the tests shouldn't require different values for these parameters.
        • Line 199 "0-0.01" should be "0-0.1"
        • Line 205 needs indenting
        Show
        Eli Collins added a comment - Change looks good. If you don't mind let's commit this patch independently so we track different changes in different jiras. Minor comments: Nits: Rather than pass timeout and allowedRange to waitForHeartBeat I'd have them be constants in the test function, seems like the tests shouldn't require different values for these parameters. Line 199 "0-0.01" should be "0-0.1" Line 205 needs indenting
        Hide
        Matt Foley added a comment -

        Incorporated Eli's suggestions.

        I do plan to commit this separately, in fact I broke the link from HDFS-1295 for all the unit test modules except TestDatanodeBlockScanner which is directly related to it.

        Show
        Matt Foley added a comment - Incorporated Eli's suggestions. I do plan to commit this separately, in fact I broke the link from HDFS-1295 for all the unit test modules except TestDatanodeBlockScanner which is directly related to it.
        Hide
        Eli Collins added a comment -

        +1 lgtm

        Show
        Eli Collins added a comment - +1 lgtm
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12476951/1808_TestBalancer_v4.patch
        against trunk revision 1095461.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.hdfs.TestFileConcurrentReader

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/394//testReport/
        Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/394//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/394//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12476951/1808_TestBalancer_v4.patch against trunk revision 1095461. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestFileConcurrentReader +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/394//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/394//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/394//console This message is automatically generated.
        Hide
        Matt Foley added a comment -

        The javadoc error is due to HDFS-1857, and unrelated to this patch.
        The unit test failure on TestFileConcurrentReader is unrelated to this patch.

        Ready to commit.

        Show
        Matt Foley added a comment - The javadoc error is due to HDFS-1857 , and unrelated to this patch. The unit test failure on TestFileConcurrentReader is unrelated to this patch. Ready to commit.
        Hide
        Eli Collins added a comment -

        I've committed this. Thanks Matt!

        Show
        Eli Collins added a comment - I've committed this. Thanks Matt!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #609 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/609/)
        HDFS-1808. TestBalancer waits forever, errs without giving information. Contributed by Matt Foley

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #609 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/609/ ) HDFS-1808 . TestBalancer waits forever, errs without giving information. Contributed by Matt Foley
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #650 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/650/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #650 (See https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/650/ )

          People

          • Assignee:
            Matt Foley
            Reporter:
            Matt Foley
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development