Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1054

Remove unnecessary sleep after failure in nextBlockOutputStream

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.3, 0.20-append, 0.21.0, 0.22.0
    • Fix Version/s: 0.20-append, 0.20.205.0, 0.21.0
    • Component/s: hdfs-client
    • Labels:
      None

      Description

      If DFSOutputStream fails to create a pipeline, it currently sleeps 6 seconds before retrying. I don't see a great reason to wait at all, much less 6 seconds (especially now that HDFS-630 ensures that a retry won't go back to the bad node). We should at least make it configurable, and perhaps something like backoff makes some sense.

      1. hdfs-1054-0.20-append.txt
        1 kB
        Todd Lipcon
      2. hdfs-1054.txt
        2 kB
        Todd Lipcon
      3. hdfs-1054.txt
        2 kB
        Todd Lipcon
      4. HDFS-1054.20-security.1.patch
        1 kB
        Jitendra Nath Pandey

        Activity

        Todd Lipcon created issue -
        Hide
        Todd Lipcon added a comment -

        FYI the situation where I'm running into this is an hbase edits stress test where I'm killing a DN that's local to one of the region servers. That region server immediately starts acting up because all file creates take an extra 6 seconds (it still thinks the local DN is up until the NN marks it down). In this case it's getting an immediate "Connection Refused" from the local DN anyway, and the second attempt always works fine since it makes it into the HDFS-630 excludedNodes list

        Show
        Todd Lipcon added a comment - FYI the situation where I'm running into this is an hbase edits stress test where I'm killing a DN that's local to one of the region servers. That region server immediately starts acting up because all file creates take an extra 6 seconds (it still thinks the local DN is up until the NN marks it down). In this case it's getting an immediate "Connection Refused" from the local DN anyway, and the second attempt always works fine since it makes it into the HDFS-630 excludedNodes list
        Hide
        Todd Lipcon added a comment -

        Here's a patch which removes the sleep altogether. I looked through the svn logs and this sleep has been there since Nutch, and I can't think of any particular reason why it's useful.

        If someone can think of a time when it actually is a good idea to sleep, I'll upload a patch that makes it configurable, but otherwise I think we should just remove it.

        Show
        Todd Lipcon added a comment - Here's a patch which removes the sleep altogether. I looked through the svn logs and this sleep has been there since Nutch, and I can't think of any particular reason why it's useful. If someone can think of a time when it actually is a good idea to sleep, I'll upload a patch that makes it configurable, but otherwise I think we should just remove it.
        Todd Lipcon made changes -
        Field Original Value New Value
        Attachment hdfs-1054.txt [ 12440729 ]
        Todd Lipcon made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Todd Lipcon made changes -
        Summary Make sleep after failure in nextBlockOutputStream smarter and configurable Remove unnecessary sleep after failure in nextBlockOutputStream
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12440729/hdfs-1054.txt
        against trunk revision 930967.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/298/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/298/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/298/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/298/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440729/hdfs-1054.txt against trunk revision 930967. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/298/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/298/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/298/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/298/console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        -1 tests included

        Not sure how I would test this - it's very difficult to test for the lack of a sleep

        Show
        Todd Lipcon added a comment - -1 tests included Not sure how I would test this - it's very difficult to test for the lack of a sleep
        Hide
        Jakob Homan added a comment -

        Patch looks good except for unnecessary changes to log output (combining two log entries into one). We've not nailed down the log output format yet, but there's still no need to change it without need, in case someone is currently parsing its output.

        Show
        Jakob Homan added a comment - Patch looks good except for unnecessary changes to log output (combining two log entries into one). We've not nailed down the log output format yet, but there's still no need to change it without need, in case someone is currently parsing its output.
        Hide
        Jakob Homan added a comment -

        Canceling pending requested revision.

        Show
        Jakob Homan added a comment - Canceling pending requested revision.
        Jakob Homan made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Todd Lipcon added a comment -

        I combined the log entries because I think the bad datanode should be INFO level. Would you prefer that I just change that message to INFO rather than combining them at the same time? Given that the abandonment is a direct result of the bad node, I think it makes more sense to have one message. Also, the node exclusion log is from HDFS-630 which hasn't been released yet, so I don't think we risk that anyone is parsing it.

        Show
        Todd Lipcon added a comment - I combined the log entries because I think the bad datanode should be INFO level. Would you prefer that I just change that message to INFO rather than combining them at the same time? Given that the abandonment is a direct result of the bad node, I think it makes more sense to have one message. Also, the node exclusion log is from HDFS-630 which hasn't been released yet, so I don't think we risk that anyone is parsing it.
        Hide
        Jakob Homan added a comment -

        Changing to INFO should work fine.

        Show
        Jakob Homan added a comment - Changing to INFO should work fine.
        Hide
        Todd Lipcon added a comment -

        New patch addresses Jakob's review comments. Thanks!

        Show
        Todd Lipcon added a comment - New patch addresses Jakob's review comments. Thanks!
        Todd Lipcon made changes -
        Attachment hdfs-1054.txt [ 12442793 ]
        Todd Lipcon made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12442793/hdfs-1054.txt
        against trunk revision 937855.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/160/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/160/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/160/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/160/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442793/hdfs-1054.txt against trunk revision 937855. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/160/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/160/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/160/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/160/console This message is automatically generated.
        Hide
        Jakob Homan added a comment -

        Thanks for the revision. I can't reproduce the failed test. Contrib failure is unrelated. +1.
        I've committed this. Thanks Todd. Resolving as fixed.

        Show
        Jakob Homan added a comment - Thanks for the revision. I can't reproduce the failed test. Contrib failure is unrelated. +1. I've committed this. Thanks Todd. Resolving as fixed.
        Jakob Homan made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Todd Lipcon added a comment -

        Marking for 0.20-append

        Show
        Todd Lipcon added a comment - Marking for 0.20-append
        Todd Lipcon made changes -
        Fix Version/s 0.20-append [ 12315103 ]
        Hide
        Todd Lipcon added a comment -

        Reopening for commit to append branch

        Show
        Todd Lipcon added a comment - Reopening for commit to append branch
        Todd Lipcon made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Hide
        Todd Lipcon added a comment -

        Attaching patch for 0.20-append branch. It's just slightly different than what went into trunk (did less cleanup to minimize change). No unit test but has been cluster tested for last several months with lots of failure testing.

        Show
        Todd Lipcon added a comment - Attaching patch for 0.20-append branch. It's just slightly different than what went into trunk (did less cleanup to minimize change). No unit test but has been cluster tested for last several months with lots of failure testing.
        Todd Lipcon made changes -
        Attachment hdfs-1054-0.20-append.txt [ 12446919 ]
        dhruba borthakur made changes -
        Affects Version/s 0.20-append [ 12315103 ]
        Affects Version/s 0.20.3 [ 12314814 ]
        Affects Version/s 0.21.0 [ 12314046 ]
        Hide
        dhruba borthakur added a comment -

        I committed it into the hadoop-0.20-append branch (as indicated by the affected version) but reverted the fix-version to the original value of 0.21.

        Show
        dhruba borthakur added a comment - I committed it into the hadoop-0.20-append branch (as indicated by the affected version) but reverted the fix-version to the original value of 0.21.
        dhruba borthakur made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Fix Version/s 0.21.0 [ 12314046 ]
        Fix Version/s 0.20-append [ 12315103 ]
        Resolution Fixed [ 1 ]
        Todd Lipcon made changes -
        Fix Version/s 0.20-append [ 12315103 ]
        Tom White made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Suresh Srinivas made changes -
        Fix Version/s 0.20.205.0 [ 12316392 ]
        Suresh Srinivas made changes -
        Fix Version/s 0.20.205.0 [ 12316392 ]
        Hide
        Jitendra Nath Pandey added a comment -

        Patch for 20-security branch.

        Show
        Jitendra Nath Pandey added a comment - Patch for 20-security branch.
        Jitendra Nath Pandey made changes -
        Attachment HDFS-1054.20-security.1.patch [ 12492802 ]
        Hide
        Suresh Srinivas added a comment -

        +1 for the patch. I committed the patch to 0.20-security branch.

        Show
        Suresh Srinivas added a comment - +1 for the patch. I committed the patch to 0.20-security branch.
        Suresh Srinivas made changes -
        Fix Version/s 0.20.205.0 [ 12316392 ]

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development