HBase
  1. HBase
  2. HBASE-6728

[89-fb] prevent OOM possibility due to per connection responseQueue being unbounded

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.3
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      This issue adds config parameter, ipc.server.response.queue.maxsize, to control the maximum size of response queue.
      Default response queue max size is 1GB.
      Show
      This issue adds config parameter, ipc.server.response.queue.maxsize, to control the maximum size of response queue. Default response queue max size is 1GB.

      Description

      The per connection responseQueue is an unbounded queue. The request handler threads today try to send the response in line, but if things start to backup, the response is sent via a per connection responder thread. This intermediate queue, because it has no bounds, can be another source of OOMs.

      [Have not looked at this issue in trunk. So it may or may not be applicable there.]

      1. 6728.94
        21 kB
        Ted Yu
      2. 6728-trunk.txt
        26 kB
        Ted Yu

        Issue Links

          Activity

          Kannan Muthukkaruppan created issue -
          Hide
          stack added a comment -

          I looked at trunk. We have same issue. responseQueue is a LinkedList. We keep a count of entries but we do not put an upper bound on how many items we add.

          Show
          stack added a comment - I looked at trunk. We have same issue. responseQueue is a LinkedList. We keep a count of entries but we do not put an upper bound on how many items we add.
          Kannan Muthukkaruppan made changes -
          Field Original Value New Value
          Description The per connection responseQueue is an unbounded queue. The request handler threads today try to send the response in line, but if things start to backup, the response is sent via a per connection responder thread. The intermediate queue used in between has no bound either, and can be another source of OOMs.

          [Have not looked at this issue in trunk. So it may or may not be applicable there.]
          The per connection responseQueue is an unbounded queue. The request handler threads today try to send the response in line, but if things start to backup, the response is sent via a per connection responder thread. This intermediate queue, because it has no bounds, can be another source of OOMs.

          [Have not looked at this issue in trunk. So it may or may not be applicable there.]
          Kannan Muthukkaruppan made changes -
          Assignee Kannan Muthukkaruppan [ kannanm ] Michal Gregorczyk [ michalgr ]
          Hide
          Kannan Muthukkaruppan added a comment -

          The details of 89-fb fix are here:

          https://reviews.facebook.net/D5337
          or
          http://svn.apache.org/viewvc?view=revision&revision=1385058

          But keeping the issue open since trunk doesn't address this yet.

          Show
          Kannan Muthukkaruppan added a comment - The details of 89-fb fix are here: https://reviews.facebook.net/D5337 or http://svn.apache.org/viewvc?view=revision&revision=1385058 But keeping the issue open since trunk doesn't address this yet.
          Hide
          Ted Yu added a comment -
          + * This test can fail from time to time and it is ok.
          + * It tests some race conditions that can happen
          + * occasionally, but not every time.
          + */
          +public class TestSizeBasedThrottler {
          

          I ran the test in trunk and didn't see failure.
          I wonder whether the above comment should be kept for trunk.

          Show
          Ted Yu added a comment - + * This test can fail from time to time and it is ok. + * It tests some race conditions that can happen + * occasionally, but not every time. + */ + public class TestSizeBasedThrottler { I ran the test in trunk and didn't see failure. I wonder whether the above comment should be kept for trunk.
          Hide
          Ted Yu added a comment -

          Patch for trunk, ported from fix in 0.89-fb

          Show
          Ted Yu added a comment - Patch for trunk, ported from fix in 0.89-fb
          Ted Yu made changes -
          Attachment 5337-trunk.txt [ 12550139 ]
          Ted Yu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Fix Version/s 0.96.0 [ 12320040 ]
          Ted Yu made changes -
          Attachment 5337-trunk.txt [ 12550139 ]
          Ted Yu made changes -
          Attachment 6728-trunk.txt [ 12550140 ]
          Ted Yu made changes -
          Attachment 6728-trunk.txt [ 12550140 ]
          Ted Yu made changes -
          Attachment 6728-trunk.txt [ 12550141 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12550141/6728-trunk.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 82 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestSplitTransaction

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3104//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3104//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3104//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3104//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3104//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3104//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3104//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550141/6728-trunk.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 82 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestSplitTransaction Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3104//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3104//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3104//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3104//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3104//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3104//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3104//console This message is automatically generated.
          Ted Yu made changes -
          Attachment 6728-trunk.txt [ 12550141 ]
          Ted Yu made changes -
          Attachment 6728-trunk.txt [ 12550142 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12550142/6728-trunk.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 82 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3105//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3105//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3105//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3105//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3105//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3105//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3105//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550142/6728-trunk.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 82 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3105//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3105//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3105//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3105//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3105//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3105//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3105//console This message is automatically generated.
          Ted Yu made changes -
          Attachment 6728-trunk.txt [ 12550142 ]
          Hide
          Ted Yu added a comment -

          Fixes some comments.

          Show
          Ted Yu added a comment - Fixes some comments.
          Ted Yu made changes -
          Attachment 6728-trunk.txt [ 12550146 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12550146/6728-trunk.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 82 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3106//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3106//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3106//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3106//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3106//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3106//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3106//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550146/6728-trunk.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 82 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3106//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3106//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3106//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3106//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3106//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3106//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3106//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          Will hold integration till Monday so that other people can take a look at the patch.

          Show
          Ted Yu added a comment - Will hold integration till Monday so that other people can take a look at the patch.
          Ted Yu made changes -
          Link This issue is related to HBASE-2506 [ HBASE-2506 ]
          Hide
          Ted Yu added a comment -

          Plan to integrate to trunk this afternoon.

          Show
          Ted Yu added a comment - Plan to integrate to trunk this afternoon.
          Hide
          stack added a comment -

          Nit:

          Could assign and declare in the one step instead of have it straddle constructor:

          
          +    this.currentSize = new AtomicLong(0);
          

          Do we have to pollute metrics w/ an actual instance of HRegionServer:

          +  private HRegionServer regionServer;
          

          Do we have to add new public method on HRegionServer to getResponseQueueSize?

          Lets not put this in trunk, not until after hbase-6410 goes in. It does this properly. Can go into 0.94.

          Show
          stack added a comment - Nit: Could assign and declare in the one step instead of have it straddle constructor: + this .currentSize = new AtomicLong(0); Do we have to pollute metrics w/ an actual instance of HRegionServer: + private HRegionServer regionServer; Do we have to add new public method on HRegionServer to getResponseQueueSize? Lets not put this in trunk, not until after hbase-6410 goes in. It does this properly. Can go into 0.94.
          Hide
          Ted Yu added a comment -

          Integrated to trunk.

          Thanks for the patch, Michal.

          Show
          Ted Yu added a comment - Integrated to trunk. Thanks for the patch, Michal.
          Hide
          Ted Yu added a comment -

          Backed out after seeing Stack's comment.

          Show
          Ted Yu added a comment - Backed out after seeing Stack's comment.
          Hide
          stack added a comment -

          Ted Yu Thanks Ted. Elliott's coming patch has means of addressing my bugaboo.

          Show
          stack added a comment - Ted Yu Thanks Ted. Elliott's coming patch has means of addressing my bugaboo.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #3472 (See https://builds.apache.org/job/HBase-TRUNK/3472/)
          HBASE-6728 revert (Revision 1401012)
          HBASE-6728 prevent OOM possibility due to per connection responseQueue being unbounded (Revision 1401008)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/SizeBasedThrottler.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestSizeBasedThrottler.java

          tedyu :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/SizeBasedThrottler.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestSizeBasedThrottler.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #3472 (See https://builds.apache.org/job/HBase-TRUNK/3472/ ) HBASE-6728 revert (Revision 1401012) HBASE-6728 prevent OOM possibility due to per connection responseQueue being unbounded (Revision 1401008) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/SizeBasedThrottler.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestSizeBasedThrottler.java tedyu : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/SizeBasedThrottler.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestSizeBasedThrottler.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #231 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/231/)
          HBASE-6728 revert (Revision 1401012)
          HBASE-6728 prevent OOM possibility due to per connection responseQueue being unbounded (Revision 1401008)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/SizeBasedThrottler.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestSizeBasedThrottler.java

          tedyu :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/SizeBasedThrottler.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestSizeBasedThrottler.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #231 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/231/ ) HBASE-6728 revert (Revision 1401012) HBASE-6728 prevent OOM possibility due to per connection responseQueue being unbounded (Revision 1401008) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/SizeBasedThrottler.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestSizeBasedThrottler.java tedyu : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/SizeBasedThrottler.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestSizeBasedThrottler.java
          Ted Yu made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Fix Version/s 0.94.3 [ 12323144 ]
          Hide
          Ted Yu added a comment -

          @Lars:
          Do you want to take a look at patch for 0.94 ?

          Show
          Ted Yu added a comment - @Lars: Do you want to take a look at patch for 0.94 ?
          Ted Yu made changes -
          Attachment 6728.94 [ 12550407 ]
          Hide
          Ted Yu added a comment -

          Ran through 0.94 test suite.
          There was one failure:

          Failed tests:   queueFailover(org.apache.hadoop.hbase.replication.TestReplication): Waited too much time for queueFailover replication. Waited 56196ms.
          

          I ran the test separately and it passed.

          Show
          Ted Yu added a comment - Ran through 0.94 test suite. There was one failure: Failed tests: queueFailover(org.apache.hadoop.hbase.replication.TestReplication): Waited too much time for queueFailover replication. Waited 56196ms. I ran the test separately and it passed.
          Hide
          Lars Hofhansl added a comment -

          Looking now.

          Show
          Lars Hofhansl added a comment - Looking now.
          Hide
          Lars Hofhansl added a comment -

          Hmm... I'm +0 on this.
          If other folks think this should go into 0.94 that's fine, but I won't push it.

          Show
          Lars Hofhansl added a comment - Hmm... I'm +0 on this. If other folks think this should go into 0.94 that's fine, but I won't push it.
          Hide
          Ted Yu added a comment -

          Integrated to 0.94 branch.

          Stack and I think this should go into 0.94

          Thanks for the review, Stack and Lars.

          Show
          Ted Yu added a comment - Integrated to 0.94 branch. Stack and I think this should go into 0.94 Thanks for the review, Stack and Lars.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #549 (See https://builds.apache.org/job/HBase-0.94/549/)
          HBASE-6728 prevent OOM possibility due to per connection responseQueue being unbounded (Revision 1401382)

          Result = SUCCESS
          tedyu :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/SizeBasedThrottler.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestSizeBasedThrottler.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #549 (See https://builds.apache.org/job/HBase-0.94/549/ ) HBASE-6728 prevent OOM possibility due to per connection responseQueue being unbounded (Revision 1401382) Result = SUCCESS tedyu : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/SizeBasedThrottler.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestSizeBasedThrottler.java
          Hide
          Lars Hofhansl added a comment -

          Will this work right if I set scanner caching to 1000 and then deal with 2mb rows? In that case every response will be 2g, and it would always block and never make any progress, right?

          Show
          Lars Hofhansl added a comment - Will this work right if I set scanner caching to 1000 and then deal with 2mb rows? In that case every response will be 2g, and it would always block and never make any progress, right?
          Hide
          Lars Hofhansl added a comment -

          Stack did not +1 this. He just said: "Can go into 0.94" (as opposed to trunk).

          Show
          Lars Hofhansl added a comment - Stack did not +1 this. He just said: "Can go into 0.94" (as opposed to trunk).
          Hide
          Ted Yu added a comment -

          set scanner caching to 1000 and then deal with 2mb rows

          Then the scanner caching was set too high

          Show
          Ted Yu added a comment - set scanner caching to 1000 and then deal with 2mb rows Then the scanner caching was set too high
          Hide
          Lars Hofhansl added a comment -

          That might be, but it will hang that RegionServer forever, no? An OOM would have been preferable in that (it would cause an alert)

          Unless that is explained (maybe I am misreading the patch) or addressed soon, I would prefer backing this change out again.

          Show
          Lars Hofhansl added a comment - That might be, but it will hang that RegionServer forever, no? An OOM would have been preferable in that (it would cause an alert) Unless that is explained (maybe I am misreading the patch) or addressed soon, I would prefer backing this change out again.
          Hide
          Ted Yu added a comment -

          HBaseServer would continue responding to requests after SizeBasedThrottler.decrease() is called.
          The following is one example:

                      responseQueue.poll();
                      responseQueuesSizeThrottler.decrease(call.response.limit());    
          
          Show
          Ted Yu added a comment - HBaseServer would continue responding to requests after SizeBasedThrottler.decrease() is called. The following is one example: responseQueue.poll(); responseQueuesSizeThrottler.decrease(call.response.limit());
          Hide
          stack added a comment -

          Yeah, thought it could go into 0.94. Would think ol' Iron Hand would have final say.

          Show
          stack added a comment - Yeah, thought it could go into 0.94. Would think ol' Iron Hand would have final say.
          Hide
          Lars Hofhansl added a comment -

          I actually misread the patch. It checks the current size (not the current size + what is being added).
          So in my example above it would add the 2g response to the queue, but then block other responses until that one is done. This seems fine.

          Show
          Lars Hofhansl added a comment - I actually misread the patch. It checks the current size (not the current size + what is being added). So in my example above it would add the 2g response to the queue, but then block other responses until that one is done. This seems fine.
          Hide
          Lars Hofhansl added a comment -

          org.apache.hadoop.hbase.util.TestSizeBasedThrottler.testBigIncreases just failed in 0.94.
          https://builds.apache.org/job/HBase-0.94/550/

          Show
          Lars Hofhansl added a comment - org.apache.hadoop.hbase.util.TestSizeBasedThrottler.testBigIncreases just failed in 0.94. https://builds.apache.org/job/HBase-0.94/550/
          Hide
          Ted Yu added a comment -

          I ran the test locally over 10 times and didn't reproduce the failure.
          As the original javadoc for TestSizeBasedThrottler stated, there was a chance of hangup.

                    200 // wait for 200ms
          

          I think the timeout above is a bit short.

          I will continue trying to reproduce the test failure.

          Show
          Ted Yu added a comment - I ran the test locally over 10 times and didn't reproduce the failure. As the original javadoc for TestSizeBasedThrottler stated, there was a chance of hangup. 200 // wait for 200ms I think the timeout above is a bit short. I will continue trying to reproduce the test failure.
          Ted Yu made changes -
          Release Note This issue adds config parameter, ipc.server.response.queue.maxsize, to control the maximum size of response queue.
          Default response queue max size is 1GB.
          Hide
          Lars Hofhansl added a comment -

          This was committed. Marked it as such.

          Show
          Lars Hofhansl added a comment - This was committed. Marked it as such.
          Lars Hofhansl made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Resolution Fixed [ 1 ]
          Ted Yu made changes -
          Fix Version/s 0.96.0 [ 12320040 ]
          Ted Yu made changes -
          Link This issue relates to HBASE-7041 [ HBASE-7041 ]
          Hide
          Kannan Muthukkaruppan added a comment - - edited

          Lars wrote: <<<Will this work right if I set scanner caching to 1000 and then deal with 2mb rows? In that case every response will be 2g, and it would always block and never make any progress, right?>>

          Yes, we considered that, and that's the reason for not using a simple counting semaphore that's initialized to the max size. We want the implementation to allow one request to exceed the queue size. We set the default at 1G, but we can exceed the limit by 1 requests' size amount.

          From http://svn.apache.org/viewvc/hbase/branches/0.89-fb/src/main/java/org/apache/hadoop/hbase/util/SizeBasedThrottler.java?view=markup&pathrev=1385058 :

          15       * This implementation allows you to set the value of internal
          16	 * counter to be greater than threshold. It happens
          17	 * when internal counter is lower than threshold and
          18	 * increase method is called with parameter 'delta' big enough
          19	 * so that sum of delta and internal counter is greater than
          20	 * threshold. This is not a bug, this is a feature.
          21	 * It solves some problems:
          22	 *   - thread calling increase with big parameter will not be
          23	 *     starved by other threads calling increase with small
          24	 *     arguments.
          25	 *   - thread calling increase with argument greater than
          26	 *     threshold won't deadlock. This is useful when throttling
          27	 *     queues - you can submit object that is bigger than limit
          
          Show
          Kannan Muthukkaruppan added a comment - - edited Lars wrote: <<<Will this work right if I set scanner caching to 1000 and then deal with 2mb rows? In that case every response will be 2g, and it would always block and never make any progress, right?>> Yes, we considered that, and that's the reason for not using a simple counting semaphore that's initialized to the max size. We want the implementation to allow one request to exceed the queue size. We set the default at 1G, but we can exceed the limit by 1 requests' size amount. From http://svn.apache.org/viewvc/hbase/branches/0.89-fb/src/main/java/org/apache/hadoop/hbase/util/SizeBasedThrottler.java?view=markup&pathrev=1385058 : 15 * This implementation allows you to set the value of internal 16 * counter to be greater than threshold. It happens 17 * when internal counter is lower than threshold and 18 * increase method is called with parameter 'delta' big enough 19 * so that sum of delta and internal counter is greater than 20 * threshold. This is not a bug, this is a feature. 21 * It solves some problems: 22 * - thread calling increase with big parameter will not be 23 * starved by other threads calling increase with small 24 * arguments. 25 * - thread calling increase with argument greater than 26 * threshold won't deadlock. This is useful when throttling 27 * queues - you can submit object that is bigger than limit
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security-on-Hadoop-23 #9 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/9/)
          HBASE-6728 prevent OOM possibility due to per connection responseQueue being unbounded (Revision 1401382)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/SizeBasedThrottler.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestSizeBasedThrottler.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security-on-Hadoop-23 #9 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/9/ ) HBASE-6728 prevent OOM possibility due to per connection responseQueue being unbounded (Revision 1401382) Result = FAILURE tedyu : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/SizeBasedThrottler.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestSizeBasedThrottler.java
          Lars Hofhansl made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Michal Gregorczyk
              Reporter:
              Kannan Muthukkaruppan
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development