Hadoop Common
  1. Hadoop Common
  2. HADOOP-2232

Add option to disable nagles algorithm in the IPC Server

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.16.0
    • Fix Version/s: 0.16.0
    • Component/s: ipc
    • Labels:
      None

      Description

      While investigating hbase performance, I found a bottleneck caused by
      Nagles algorithm. For some reads I would get a bi-modal distribution
      of read times, with about half the times being around 20ms, and half
      around 200ms. I tracked this down to the well-known interaction between
      Nagle's algorithm and TCP delayed acknowledgments.

      I found that calling setTcpNoDelay(true) on the server's socket
      connection dropped all of my read times back to a constant 20 ms.

      I propose a patch to have this TCP_NODELAY option be configurable. The
      attacked patch allows one to set the TCP_NODELAY option on both the
      client and the server side. Currently this is defaulted to false
      (i.e., with Nagle's enabled).

      To see the effect, I have included a Test which provokes the issue by
      sending a MapWriteable over an IPC call. On my machine this test shows
      a speedup of 117 times when using TCP_NODELAY.

      These tests were done on OSX 10.4. Your milage may very with other
      TCP/IP implementation stacks.

      1. 2232-3.patch
        4 kB
        Chris Douglas
      2. HADOOP-2232-1.patch
        9 kB
        Clint Morgan
      3. HADOOP-2232-2.patch
        4 kB
        Clint Morgan

        Activity

        Hide
        Chris Douglas added a comment -

        Patch looks good; I've seen disabling Nagle work well for other RPC systems and like the idea of permitting it here.

        The unit test fails if TcpNoDelay doesn't improve performance by at least 2x. I'm uncertain what sorts of regressions we'd prevent by including it, particularly since it's using a custom protocol. Did you have something in mind, or did you include it as a useful illustration of the concept?

        Show
        Chris Douglas added a comment - Patch looks good; I've seen disabling Nagle work well for other RPC systems and like the idea of permitting it here. The unit test fails if TcpNoDelay doesn't improve performance by at least 2x. I'm uncertain what sorts of regressions we'd prevent by including it, particularly since it's using a custom protocol. Did you have something in mind, or did you include it as a useful illustration of the concept?
        Hide
        Clint Morgan added a comment -

        Yeah, that test was just to illustrate the issue, and not for inclusion.

        Show
        Clint Morgan added a comment - Yeah, that test was just to illustrate the issue, and not for inclusion.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12369828/HADOOP-2232-1.patch
        against trunk revision r596495.

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests -1. The patch failed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1122/testReport/
        Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1122/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1122/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1122/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12369828/HADOOP-2232-1.patch against trunk revision r596495. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1122/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1122/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1122/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1122/console This message is automatically generated.
        Hide
        stack added a comment -

        Clint, can you redo your patch so it doesn't include a test that fails. Otherwise, +1 on the patch. It looks good.

        Am running some hbase tests to see difference before and after patch. Will report back when done.

        Show
        stack added a comment - Clint, can you redo your patch so it doesn't include a test that fails. Otherwise, +1 on the patch. It looks good. Am running some hbase tests to see difference before and after patch. Will report back when done.
        Hide
        Clint Morgan added a comment -

        Same patch, but with the test removed.

        However, the test failure was in an unrelated test.

        I don't think TCP_NODELAY will affect the current PerformanceEvaluation. It only affected my results with getRow() when the results were of particular sizes.

        However, if people notice consistent/uniform high latency response times, then it could be a sign of bad interaction between nagles and delayed ACKs that TCP_NODELAY would help.

        Show
        Clint Morgan added a comment - Same patch, but with the test removed. However, the test failure was in an unrelated test. I don't think TCP_NODELAY will affect the current PerformanceEvaluation. It only affected my results with getRow() when the results were of particular sizes. However, if people notice consistent/uniform high latency response times, then it could be a sign of bad interaction between nagles and delayed ACKs that TCP_NODELAY would help.
        Hide
        Raghu Angadi added a comment -

        Are there any disadvantages of enabling this option by default?

        Show
        Raghu Angadi added a comment - Are there any disadvantages of enabling this option by default?
        Hide
        stack added a comment -

        Dang. So this IPC switch-flipping ain't the silver bullet thats going to fix all hbase performance issues?

        I did rough timings using sequentialRead in PE. Setting ipc.server.tcpnodelay to true made the test run much slower (cells are 1k in size).

        Otherwise, +1 on the patch after review and test on my little cluster. Seems like an option that will be important for certain loadings..

        Show
        stack added a comment - Dang. So this IPC switch-flipping ain't the silver bullet thats going to fix all hbase performance issues? I did rough timings using sequentialRead in PE. Setting ipc.server.tcpnodelay to true made the test run much slower (cells are 1k in size). Otherwise, +1 on the patch after review and test on my little cluster. Seems like an option that will be important for certain loadings..
        Hide
        Clint Morgan added a comment -

        As I understand it, a possible disadvantage is increased network traffic and bandwidth. Disabling nagles means we send packets for small amounts of data, and so we spend more bandwidth on packet headers.

        Show
        Clint Morgan added a comment - As I understand it, a possible disadvantage is increased network traffic and bandwidth. Disabling nagles means we send packets for small amounts of data, and so we spend more bandwidth on packet headers.
        Hide
        Owen O'Malley added a comment -

        Makund,
        Can you please run a 500 node sort and look for task failures and execution time degredations? Thanks!

        Show
        Owen O'Malley added a comment - Makund, Can you please run a 500 node sort and look for task failures and execution time degredations? Thanks!
        Hide
        Mukund Madhugiri added a comment -

        Finally getting around to run a 500 node sort benchmark. I tried to apply the patch to trunk and it fails. Is it possible to upload a new patch that works with trunk? Thanks

        Show
        Mukund Madhugiri added a comment - Finally getting around to run a 500 node sort benchmark. I tried to apply the patch to trunk and it fails. Is it possible to upload a new patch that works with trunk? Thanks
        Hide
        Chris Douglas added a comment -

        Merged patch with latest trunk

        Show
        Chris Douglas added a comment - Merged patch with latest trunk
        Hide
        Chris Douglas added a comment -

        Submitting re-merged patch for Hudson

        Show
        Chris Douglas added a comment - Submitting re-merged patch for Hudson
        Hide
        Mukund Madhugiri added a comment -

        Here is data from the Sort benchmark run on 500 nodes. Sort validation could not be compared as it is broken on trunk (HADOOP-2646).

        NOTE: The trunk run is 4 days old and will have new data on latest trunk tomorrow.

        500 nodes trunk trunk + patch Difference (%)
        randomWriter (mins) 24 28 ( 17.9% )
        sort (mins) 91 113 ( 23% )

        I see exceptions of this kind in the JT logs:
        2008-01-22 22:21:35,162 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200801222207_0001_m_000989_0: java.io.IOException: All datanodes are bad. Aborting...
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:1831)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1100(DFSClient.java:1479)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1571)

        2008-01-22 23:09:01,629 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200801222207_0002_m_036186_0: java.net.UnknownHostException: unknown host: <hostname>
        at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:142)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:568)
        at org.apache.hadoop.ipc.Client.call(Client.java:501)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
        at org.apache.hadoop.dfs.$Proxy1.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:291)
        at org.apache.hadoop.dfs.DFSClient.createNamenode(DFSClient.java:127)
        at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:143)
        at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:64)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:164)

        Show
        Mukund Madhugiri added a comment - Here is data from the Sort benchmark run on 500 nodes. Sort validation could not be compared as it is broken on trunk ( HADOOP-2646 ). NOTE: The trunk run is 4 days old and will have new data on latest trunk tomorrow. 500 nodes trunk trunk + patch Difference (%) randomWriter (mins) 24 28 ( 17.9% ) sort (mins) 91 113 ( 23% ) I see exceptions of this kind in the JT logs: 2008-01-22 22:21:35,162 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200801222207_0001_m_000989_0: java.io.IOException: All datanodes are bad. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:1831) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1100(DFSClient.java:1479) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1571) 2008-01-22 23:09:01,629 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200801222207_0002_m_036186_0: java.net.UnknownHostException: unknown host: <hostname> at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:142) at org.apache.hadoop.ipc.Client.getConnection(Client.java:568) at org.apache.hadoop.ipc.Client.call(Client.java:501) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198) at org.apache.hadoop.dfs.$Proxy1.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:291) at org.apache.hadoop.dfs.DFSClient.createNamenode(DFSClient.java:127) at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:143) at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:64) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:164)
        Hide
        dhruba borthakur added a comment -

        The exception messages are related to HADOOP-1707. This JIRA changed the error recovery model. Earlier, the client used to cache the entire disk block. When it is full it uploads the entire block to a pipeline of datanodes. If the upload to the first datanode succeeded, the operation was deemed as successful.

        In the new model, the client will upload the block to all datanodes in the pipeline. In the case of error, the client establishes a new pipeline (by removing the bad datanode(s) from the pipeline) and resending outstanding data for this block. This change means that a client is now more likely to detect failures of datanodes.

        My question then is: do you see any of these exceptions when you run your test on trunk without the patch for this JIRA?

        Show
        dhruba borthakur added a comment - The exception messages are related to HADOOP-1707 . This JIRA changed the error recovery model. Earlier, the client used to cache the entire disk block. When it is full it uploads the entire block to a pipeline of datanodes. If the upload to the first datanode succeeded, the operation was deemed as successful. In the new model, the client will upload the block to all datanodes in the pipeline. In the case of error, the client establishes a new pipeline (by removing the bad datanode(s) from the pipeline) and resending outstanding data for this block. This change means that a client is now more likely to detect failures of datanodes. My question then is: do you see any of these exceptions when you run your test on trunk without the patch for this JIRA?
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12373779/2232-3.patch
        against trunk revision r614413.

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1682/testReport/
        Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1682/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1682/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1682/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12373779/2232-3.patch against trunk revision r614413. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1682/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1682/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1682/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1682/console This message is automatically generated.
        Hide
        Mukund Madhugiri added a comment -

        Ok. Here is the data from the latest trunk. I did a run last night on trunk and see the exceptions on trunk as well. Here are the new trunk numbers and comparing them with the trunk + patch numbers, this patch looks good.

        500 nodes trunk trunk + patch Difference (%)
        randomWriter (mins) 29 28 ( -1.8% )
        sort (mins) 110 113 ( 2% )
        Show
        Mukund Madhugiri added a comment - Ok. Here is the data from the latest trunk. I did a run last night on trunk and see the exceptions on trunk as well. Here are the new trunk numbers and comparing them with the trunk + patch numbers, this patch looks good. 500 nodes trunk trunk + patch Difference (%) randomWriter (mins) 29 28 ( -1.8% ) sort (mins) 110 113 ( 2% )
        Hide
        Nigel Daley added a comment -

        This should have been assigned to 0.16. It was PA before the feature freeze.

        Show
        Nigel Daley added a comment - This should have been assigned to 0.16. It was PA before the feature freeze.
        Hide
        Chris Douglas added a comment -

        I just committed this. Thanks, Clint!

        Show
        Chris Douglas added a comment - I just committed this. Thanks, Clint!
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #379 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/379/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk #645 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/645/)
        HADOOP-4525. Fix ipc.server.ipcnodelay originally missed in in .
        Contributed by Clint Morgan.

        Show
        Hudson added a comment - Integrated in Hadoop-trunk #645 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/645/ ) HADOOP-4525 . Fix ipc.server.ipcnodelay originally missed in in . Contributed by Clint Morgan.

          People

          • Assignee:
            Clint Morgan
            Reporter:
            Clint Morgan
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development