Hadoop Common
  1. Hadoop Common
  2. HADOOP-5890

Use exponential backoff on Thread.sleep during DN shutdown

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None

      Description

      Tests waste a lot of time in DataNode.shutdown. Typical logs look like:

      2009-05-21 17:13:20,177 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
      2009-05-21 17:13:20,177 INFO  datanode.DataBlockScanner (DataBlockScanner.java:run(620)) - Exiting DataBlockScanner thread.
      2009-05-21 17:13:21,117 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
      

      In this example (and very commonly) the DataBlockScanner thread exits within 5-10ms after the first wait. The DN then sleeps an entire second before succeeding in shutting down.

      Using exponential backoff from a short value like 2ms up to a maximum of 1000ms would solve this.

        Activity

        Hide
        Todd Lipcon added a comment -

        Attached patch starts off sleeping 2ms and then exponentially decays by a factor of 1.5x each time. Feel free to twiddle the ratio before committing if you so desire.

        No additional tests since this is trivial and in the code path for every MiniMR test.

        Show
        Todd Lipcon added a comment - Attached patch starts off sleeping 2ms and then exponentially decays by a factor of 1.5x each time. Feel free to twiddle the ratio before committing if you so desire. No additional tests since this is trivial and in the code path for every MiniMR test.
        Hide
        Raghu Angadi added a comment -

        Looking at trunk, there should not have been any wait since 'active threads is 0', right?

        Show
        Raghu Angadi added a comment - Looking at trunk, there should not have been any wait since 'active threads is 0', right?
        Hide
        Raghu Angadi added a comment -

        If this is from a junit test, the two logs are likely from different datanodes.

        Show
        Raghu Angadi added a comment - If this is from a junit test, the two logs are likely from different datanodes.
        Hide
        Todd Lipcon added a comment -

        Woops, I pasted a bad example from the log... here's an example that actually demonstrates the behavior discussed:

        2009-05-21 22:43:21,259 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 1
        2009-05-21 22:43:21,259 WARN  datanode.DataNode (DataXceiverServer.java:run(137)) - DatanodeRegistration(127.0.0.1:40197, storageID=DS-2052133204-127.0.1.1-40197-1242971000238, infoPort=52207, ipcPort=52592):DataXceiveServer: java.nio.channels.AsynchronousCloseException
                at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
                at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152)
                at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
                at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
                at java.lang.Thread.run(Thread.java:619)
        
        2009-05-21 22:43:21,315 INFO  datanode.DataBlockScanner (DataBlockScanner.java:run(620)) - Exiting DataBlockScanner thread.
        2009-05-21 22:43:22,259 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
        

        Note the exact 1second offset between 22:43:21,259 and 22:43:22,259. This patch reduces that significantly.

        Show
        Todd Lipcon added a comment - Woops, I pasted a bad example from the log... here's an example that actually demonstrates the behavior discussed: 2009-05-21 22:43:21,259 INFO datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 1 2009-05-21 22:43:21,259 WARN datanode.DataNode (DataXceiverServer.java:run(137)) - DatanodeRegistration(127.0.0.1:40197, storageID=DS-2052133204-127.0.1.1-40197-1242971000238, infoPort=52207, ipcPort=52592):DataXceiveServer: java.nio.channels.AsynchronousCloseException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152) at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84) at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130) at java.lang. Thread .run( Thread .java:619) 2009-05-21 22:43:21,315 INFO datanode.DataBlockScanner (DataBlockScanner.java:run(620)) - Exiting DataBlockScanner thread. 2009-05-21 22:43:22,259 INFO datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0 Note the exact 1second offset between 22:43:21,259 and 22:43:22,259. This patch reduces that significantly.
        Hide
        steve_l added a comment -

        +1 to anything that boosts setup/teardown speed

        Show
        steve_l added a comment - +1 to anything that boosts setup/teardown speed
        Hide
        dhruba borthakur added a comment -

        +1. Code looks good.

        Show
        dhruba borthakur added a comment - +1. Code looks good.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12408764/hadoop-5890.txt
        against trunk revision 778182.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/395/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/395/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/395/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/395/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408764/hadoop-5890.txt against trunk revision 778182. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/395/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/395/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/395/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/395/console This message is automatically generated.
        Hide
        steve_l added a comment -

        Committed! Thanks Todd.

        Show
        steve_l added a comment - Committed! Thanks Todd.

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development