Hadoop Common
  1. Hadoop Common
  2. HADOOP-1849

IPC server max queue size should be configurable

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.0
    • Fix Version/s: 0.20.2
    • Component/s: ipc
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently max queue size for IPC server is set to (100 * handlers). Usually when RPC failures are observed (e.g. HADOOP-1763), we increase number of handlers and the problem goes away. I think a big part of such a fix is increase in max queue size. I think we should make maxQsize per handler configurable (with a bigger default than 100). There are other improvements also (HADOOP-1841).

      Server keeps reading RPC requests from clients. When the number in-flight RPCs is larger than maxQsize, the earliest RPCs are deleted. This is the main feedback Server has for the client. I have often heard from users that Hadoop doesn't handle bursty traffic.

      Say handler count is 10 (default) and Server can handle 1000 RPCs a sec (quite conservative/low for a typical server), it implies that an RPC can wait for only for 1 sec before it is dropped. If there 3000 clients and all of them send RPCs around the same time (not very rare, with heartbeats etc), 2000 will be dropped. In stead of dropping the earliest RPCs, if the server delays reading new RPCs, the feedback to clients would be much smoother, I will file another jira regd queue management.

      For this jira I propose to make queue size per handler configurable, with a larger default (may be 500).

      1. handlerQueueSizeConfig.patch
        3 kB
        Konstantin Shvachko
      2. handlerQueueSizeConfig.patch
        4 kB
        Konstantin Shvachko
      3. handlerQueueSizeConfig.patch
        3 kB
        Konstantin Shvachko
      4. handlerQueueSizeConfig.patch
        3 kB
        Konstantin Shvachko
      5. handlerQueueSizeConfig-0.20.patch
        1 kB
        Konstantin Shvachko
      6. handlerQueueSizeConfig-0.21.patch
        3 kB
        Konstantin Shvachko

        Activity

        Hide
        Raghu Angadi added a comment -

        Also this is quite easy to test this, with a large enough cluster, of course.

        Show
        Raghu Angadi added a comment - Also this is quite easy to test this, with a large enough cluster, of course.
        Hide
        Owen O'Malley added a comment -

        The 100*handlers cap is just there as an upper bound on memory. Have you observed it actually triggering? (It is a different message.) I have not. Timeouts, yes, but not the queue length capping. I deliberately choose a very high upper bound to make sure it didn't happen randomly. I think introducing a config variable is a bad idea. Having more handlers does make a server more responsive under load, but I doubt it has anything to do with the queue length.

        Show
        Owen O'Malley added a comment - The 100*handlers cap is just there as an upper bound on memory. Have you observed it actually triggering? (It is a different message.) I have not. Timeouts, yes, but not the queue length capping. I deliberately choose a very high upper bound to make sure it didn't happen randomly. I think introducing a config variable is a bad idea. Having more handlers does make a server more responsive under load, but I doubt it has anything to do with the queue length.
        Hide
        Doug Cutting added a comment -

        Let's experiment before we commit to a new parameter. If increasing the per-handler queue constant to, e.g., 200 fixes things for now, that would be preferable. HADOOP-1841 will alter the meaning of the handler count, and HADOOP-1850 will change the way it is set. So it would be a mistake to base new parameters on it at this point.

        Show
        Doug Cutting added a comment - Let's experiment before we commit to a new parameter. If increasing the per-handler queue constant to, e.g., 200 fixes things for now, that would be preferable. HADOOP-1841 will alter the meaning of the handler count, and HADOOP-1850 will change the way it is set. So it would be a mistake to base new parameters on it at this point.
        Hide
        Raghu Angadi added a comment -

        Server log for HADOOP-1763 would have been very useful for this. As far as I remember Dhruba looked for "dropping because max q reached" messages for scalability improvements on Namenode. When these messages went away that was a good indicator of improvement. With a large cluster this is pretty easy to test.

        Yes, memory should also be a concern, though increasing handler also has the same memory increase plus memory for for each of the threads (may be 512k virtual memory for each thread). I datanode blockReports is one example where each RPC take a lot of memory.

        Show
        Raghu Angadi added a comment - Server log for HADOOP-1763 would have been very useful for this. As far as I remember Dhruba looked for "dropping because max q reached" messages for scalability improvements on Namenode. When these messages went away that was a good indicator of improvement. With a large cluster this is pretty easy to test. Yes, memory should also be a concern, though increasing handler also has the same memory increase plus memory for for each of the threads (may be 512k virtual memory for each thread). I datanode blockReports is one example where each RPC take a lot of memory.
        Hide
        Doug Cutting added a comment -

        If 500 proves a better value then, again, I would prefer we just change that constant for now rather than introduce a new parameter.

        Show
        Doug Cutting added a comment - If 500 proves a better value then, again, I would prefer we just change that constant for now rather than introduce a new parameter.
        Hide
        Raghu Angadi added a comment -


        At least while testing, if this is configurable, it would be easy to asks users to experiment with different values.

        Show
        Raghu Angadi added a comment - At least while testing, if this is configurable, it would be easy to asks users to experiment with different values.
        Hide
        Christian Kunz added a comment -

        As part of HADOOP-1874 (job running on 1400 node cluster with 60 rpc handlers for both namenode and jobtracker) we see many call queue overflows on both namenode and jobtracker, resulting in escalation of lost tasktrackers.

        Show
        Christian Kunz added a comment - As part of HADOOP-1874 (job running on 1400 node cluster with 60 rpc handlers for both namenode and jobtracker) we see many call queue overflows on both namenode and jobtracker, resulting in escalation of lost tasktrackers.
        Hide
        Konstantin Shvachko added a comment -

        We have the same discussion now about handlers vs. queue size. The scale is different but the discussion is the same as 2.5 years ago. I am going to create an undocumented configuration variable for the queue size, so that people could experiment.

        Show
        Konstantin Shvachko added a comment - We have the same discussion now about handlers vs. queue size. The scale is different but the discussion is the same as 2.5 years ago. I am going to create an undocumented configuration variable for the queue size, so that people could experiment.
        Hide
        Hong Tang added a comment -

        Server keeps reading RPC requests from clients. When the number in-flight RPCs is larger than maxQsize, the earliest RPCs are deleted. This is the main feedback Server has for the client. I have often heard from users that Hadoop doesn't handle bursty traffic.

        This means the server still wastes resources on retrieving bytes and constructing objects for the dropped request. Also, do we perform a check right before the rpc handler process the request whether the client already abandoned the request?

        Show
        Hong Tang added a comment - Server keeps reading RPC requests from clients. When the number in-flight RPCs is larger than maxQsize, the earliest RPCs are deleted. This is the main feedback Server has for the client. I have often heard from users that Hadoop doesn't handle bursty traffic. This means the server still wastes resources on retrieving bytes and constructing objects for the dropped request. Also, do we perform a check right before the rpc handler process the request whether the client already abandoned the request?
        Hide
        Konstantin Shvachko added a comment -

        I introduced "ipc.server.listen.queue.size" which defines how many calls per handler are allowed in the queue. The default is wtill 100. So there is no change for current users.

        Show
        Konstantin Shvachko added a comment - I introduced "ipc.server.listen.queue.size" which defines how many calls per handler are allowed in the queue. The default is wtill 100. So there is no change for current users.
        Hide
        Suresh Srinivas added a comment -

        Looks like the new parameter introduced is "ipc.server.handler.queue.size" and not "ipc.server.listen.queue.size".

        Comments:

        1. The core-default.xml is not updated to include this parameter.
        2. IPC_SERVER_HANDLER_QUEUE_SIZE_DEFAUL has typo, missing T in the end.
        Show
        Suresh Srinivas added a comment - Looks like the new parameter introduced is "ipc.server.handler.queue.size" and not "ipc.server.listen.queue.size". Comments: The core-default.xml is not updated to include this parameter. IPC_SERVER_HANDLER_QUEUE_SIZE_DEFAUL has typo, missing T in the end.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12435643/handlerQueueSizeConfig.patch
        against trunk revision 909806.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/357/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/357/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/357/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/357/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435643/handlerQueueSizeConfig.patch against trunk revision 909806. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/357/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/357/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/357/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/357/console This message is automatically generated.
        Hide
        Suresh Srinivas added a comment -

        Introducing a configurable parameter in core-defaults.xml, makes it common for both mapreduce and hdfs in shared configuration setup. We should introduce this parameter in hdfs config files to keep it independent, much like it is done with number of handlers.

        Show
        Suresh Srinivas added a comment - Introducing a configurable parameter in core-defaults.xml, makes it common for both mapreduce and hdfs in shared configuration setup. We should introduce this parameter in hdfs config files to keep it independent, much like it is done with number of handlers.
        Hide
        Konstantin Shvachko added a comment -

        I intentionally did not put it into any of *-dafult.xml files. I think it should be an undocumented parameter to prevent people from changing it unknowingly. There are two main reasons for that:

        1. we still don't know what is the reasonable default value
        2. the optimal rpc queue size may be different for different applications. HDFS may work better with one size and MR may need another size. So the parameter will have different values in hdfs-site.xml and mapred-site.xml. Placing the parameter in core-default.xml makes it somewhat confusing.
        Show
        Konstantin Shvachko added a comment - I intentionally did not put it into any of *-dafult.xml files. I think it should be an undocumented parameter to prevent people from changing it unknowingly. There are two main reasons for that: we still don't know what is the reasonable default value the optimal rpc queue size may be different for different applications. HDFS may work better with one size and MR may need another size. So the parameter will have different values in hdfs-site.xml and mapred-site.xml. Placing the parameter in core-default.xml makes it somewhat confusing.
        Hide
        Konstantin Shvachko added a comment -

        The new patch fixes the typo mentioned by Suresh.

        Show
        Konstantin Shvachko added a comment - The new patch fixes the typo mentioned by Suresh.
        Hide
        Suresh Srinivas added a comment -

        We should add separate config params for HDFS and MR. Also I am not sure if default optimal rpc queue size and number of handlers can be determined, given the cluster sizes could differ significantly. We should set reasonable values to these parameters (could be 40 handlers and 100 queue size per handler as it is done currently). For deployments where the default values do not work well, it could be tweaked depending on cluster size, throughput required etc.

        Show
        Suresh Srinivas added a comment - We should add separate config params for HDFS and MR. Also I am not sure if default optimal rpc queue size and number of handlers can be determined, given the cluster sizes could differ significantly. We should set reasonable values to these parameters (could be 40 handlers and 100 queue size per handler as it is done currently). For deployments where the default values do not work well, it could be tweaked depending on cluster size, throughput required etc.
        Hide
        Konstantin Shvachko added a comment -

        > We should add separate config params for HDFS and MR.

        We can use one parameter to configure HDFS and MR separately by setting it in hdfs-site.xml and mapred-site.xml, respectively.
        Looks like people feel strong about documenting the parameter. I'll add it to core-default.xml

        Show
        Konstantin Shvachko added a comment - > We should add separate config params for HDFS and MR. We can use one parameter to configure HDFS and MR separately by setting it in hdfs-site.xml and mapred-site.xml, respectively. Looks like people feel strong about documenting the parameter. I'll add it to core-default.xml
        Hide
        Raghu Angadi added a comment -

        Current RPC is not that sensitive to this queue size.

        Queue size was very important before RPC improvements in 2008, because it used to rudely drop the RPCs if the queue was full. But now, RPC server handles full queue and and bursty traffic quite well.

        Did you recently come across a case where adjusting this value improves things?

        Either way, the undocumented config parameter is fine.

        Show
        Raghu Angadi added a comment - Current RPC is not that sensitive to this queue size. Queue size was very important before RPC improvements in 2008, because it used to rudely drop the RPCs if the queue was full. But now, RPC server handles full queue and and bursty traffic quite well. Did you recently come across a case where adjusting this value improves things? Either way, the undocumented config parameter is fine.
        Hide
        Konstantin Shvachko added a comment -

        > Did you recently come across a case where adjusting this value improves things?

        Yes, as we observed recently the NN was viewed from the outside as unresponsive, that is clients, JT, TTs could not connect to it. If you look at the name-node at the same time it was working fine: no GC, average cpu usage. It turned out some clients were doing listStatus on large directories, and while processing the listStatus NN rpc server maxed out on the call queue, which did not let others to connect. We wanted to make the queue large enough so that (almost) everybody could connect and wait in the queue rather than retrying. The only way to increase the queue size currently is to increase the handler count, which was done. The idea here is to experiment with the queue size and the handler count independently of each other.
        I agree on making it undocumented for the two reasons I mentioned above. Suresh, Hairong, Sanjay think it is important to have it documented. You might have better arguments.
        Another thing, we keep thinking about the queue size in per handler terms. Should we rather specify the total queue size, and eliminate the pseudo dependency on the handler count? The *.handler.count parameters are not in common, it is hard to correlate handlers with queue sizes.

        Show
        Konstantin Shvachko added a comment - > Did you recently come across a case where adjusting this value improves things? Yes, as we observed recently the NN was viewed from the outside as unresponsive, that is clients, JT, TTs could not connect to it. If you look at the name-node at the same time it was working fine: no GC, average cpu usage. It turned out some clients were doing listStatus on large directories, and while processing the listStatus NN rpc server maxed out on the call queue, which did not let others to connect. We wanted to make the queue large enough so that (almost) everybody could connect and wait in the queue rather than retrying. The only way to increase the queue size currently is to increase the handler count, which was done. The idea here is to experiment with the queue size and the handler count independently of each other. I agree on making it undocumented for the two reasons I mentioned above. Suresh, Hairong, Sanjay think it is important to have it documented. You might have better arguments. Another thing, we keep thinking about the queue size in per handler terms. Should we rather specify the total queue size, and eliminate the pseudo dependency on the handler count? The *.handler.count parameters are not in common, it is hard to correlate handlers with queue sizes.
        Hide
        Suresh Srinivas added a comment -

        I think we should document the new parameter for the following reasons:

        1. Number of handlers is currently documented. The queue size per handler is closely related to this and should be documented as well. These numbers need tweaking based on the size of the cluster and the type of load. For example a cluster with smaller heartbeat period, requires bigger queue with the same number of handlers. A cluster could also live with longer latency instead of having to increas the number of handlers.
        2. Current approach of increasing the number of handlers has a drawback; the response buffer per handler could take up significantly large heap on increasing the number of handlers.

        That said, to me, more important thing is to have this parameter configurable. Whether it is documented or not is secondary.

        The queue size should be dependent on handler count. As the queue size increases, to certain extent (based on time spent in lock and the cost of each request etc.) the system can benefit from more number of threads. Keeping them as it is done today conveys this more clearly.

        Show
        Suresh Srinivas added a comment - I think we should document the new parameter for the following reasons: Number of handlers is currently documented. The queue size per handler is closely related to this and should be documented as well. These numbers need tweaking based on the size of the cluster and the type of load. For example a cluster with smaller heartbeat period, requires bigger queue with the same number of handlers. A cluster could also live with longer latency instead of having to increas the number of handlers. Current approach of increasing the number of handlers has a drawback; the response buffer per handler could take up significantly large heap on increasing the number of handlers. That said, to me, more important thing is to have this parameter configurable. Whether it is documented or not is secondary. The queue size should be dependent on handler count. As the queue size increases, to certain extent (based on time spent in lock and the cost of each request etc.) the system can benefit from more number of threads. Keeping them as it is done today conveys this more clearly.
        Hide
        Konstantin Shvachko added a comment -

        I updated the patch, it applies to current trunk now.
        I added the parameter to core-default.xml.
        If we decide not to document it is easy to remove the parameter from the xml file.

        Show
        Konstantin Shvachko added a comment - I updated the patch, it applies to current trunk now. I added the parameter to core-default.xml. If we decide not to document it is easy to remove the parameter from the xml file.
        Hide
        Suresh Srinivas added a comment -

        +1 patch looks good.

        Show
        Suresh Srinivas added a comment - +1 patch looks good.
        Hide
        Suresh Srinivas added a comment -

        Sorry missed this comment in my previous post - should we add to parameter document that this configuration applies to all the subsystems (HDFS, Mapreduce). To configure the subsystems separates, the parameters needs to be added to specific config file?

        Show
        Suresh Srinivas added a comment - Sorry missed this comment in my previous post - should we add to parameter document that this configuration applies to all the subsystems (HDFS, Mapreduce). To configure the subsystems separates, the parameters needs to be added to specific config file?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12436898/handlerQueueSizeConfig.patch
        against trunk revision 915168.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/380/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/380/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/380/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/380/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12436898/handlerQueueSizeConfig.patch against trunk revision 915168. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/380/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/380/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/380/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/380/console This message is automatically generated.
        Hide
        Konstantin Shvachko added a comment -

        Updating the patch to current trunk and submitting patches for 0.21 and 0.20.
        Finally deicded to make the parameter undocumented in order to avoid incompatibility issues.

        Show
        Konstantin Shvachko added a comment - Updating the patch to current trunk and submitting patches for 0.21 and 0.20. Finally deicded to make the parameter undocumented in order to avoid incompatibility issues.
        Hide
        Konstantin Shvachko added a comment -

        I just committed this.

        Show
        Konstantin Shvachko added a comment - I just committed this.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk #261 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/261/)
        . Add undocumented configuration parameter for per handler call queue size in IPC Server. Contributed by Konstantin Shvachko.

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk #261 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/261/ ) . Add undocumented configuration parameter for per handler call queue size in IPC Server. Contributed by Konstantin Shvachko.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #193 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/193/)
        . Add undocumented configuration parameter for per handler call queue size in IPC Server. Contributed by Konstantin Shvachko.

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #193 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/193/ ) . Add undocumented configuration parameter for per handler call queue size in IPC Server. Contributed by Konstantin Shvachko.

          People

          • Assignee:
            Konstantin Shvachko
            Reporter:
            Raghu Angadi
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development