HBase
  1. HBase
  2. HBASE-5560

Avoid RegionServer GC caused by timed-out calls

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.0, 0.95.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The HBaseRpcServer queues up rpc responses if the socket connection to the client is not yet ready to receive data. Calls are queued here until a 15 minute timeout occurs. I am able to generate a full GC when I artificially make a client read rpc-responses very slowly. This jira is to make this 15 minute time configurable.

      1. ASF.LICENSE.NOT.GRANTED--D2241.4.patch
        3 kB
        Phabricator
      2. ASF.LICENSE.NOT.GRANTED--D2241.3.patch
        2 kB
        Phabricator
      3. ASF.LICENSE.NOT.GRANTED--D2241.2.patch
        2 kB
        Phabricator
      4. ASF.LICENSE.NOT.GRANTED--D2241.1.patch
        2 kB
        Phabricator

        Activity

        Hide
        Phabricator added a comment -

        dhruba requested code review of "[jira]HBASE-5560 Avoid RegionServer GC caused by timed-out calls".
        Reviewers: stack, tedyu, sc, JIRA

        A slow client is not consuming rpc responses from the client. But the server caches the call responses uptil 15 minutes (hardcoded). This caused the regionserver to run out of old-gen and trigger a full GC.

        This patch makes the 15 minute value to be settable by a configurable parameter "ipc.client.call.purge.timeout". The default is still 15 minutes to maintain backward compatibility.

        TEST PLAN
        Run all unit tests

        REVISION DETAIL
        https://reviews.facebook.net/D2241

        AFFECTED FILES
        src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java

        MANAGE HERALD DIFFERENTIAL RULES
        https://reviews.facebook.net/herald/view/differential/

        WHY DID I GET THIS EMAIL?
        https://reviews.facebook.net/herald/transcript/4923/

        Tip: use the X-Herald-Rules header to filter Herald messages in your client.

        Show
        Phabricator added a comment - dhruba requested code review of " [jira] HBASE-5560 Avoid RegionServer GC caused by timed-out calls". Reviewers: stack, tedyu, sc, JIRA A slow client is not consuming rpc responses from the client. But the server caches the call responses uptil 15 minutes (hardcoded). This caused the regionserver to run out of old-gen and trigger a full GC. This patch makes the 15 minute value to be settable by a configurable parameter "ipc.client.call.purge.timeout". The default is still 15 minutes to maintain backward compatibility. TEST PLAN Run all unit tests REVISION DETAIL https://reviews.facebook.net/D2241 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/4923/ Tip: use the X-Herald-Rules header to filter Herald messages in your client.
        Hide
        Lars Hofhansl added a comment -

        +1 on patch.
        Isn't 15mins awfully long? Should we lower the default to (say) 5mins?

        Show
        Lars Hofhansl added a comment - +1 on patch. Isn't 15mins awfully long? Should we lower the default to (say) 5mins?
        Hide
        Lars Hofhansl added a comment -

        Let's get this into 0.94 as well.

        Show
        Lars Hofhansl added a comment - Let's get this into 0.94 as well.
        Hide
        Phabricator added a comment -

        sc has commented on the revision "[jira]HBASE-5560 Avoid RegionServer GC caused by timed-out calls".

        +1
        Only have one minor comment.

        INLINE COMMENTS
        src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java:1461 Can you put
        15 * 60 * 1000L
        to make it easier to read?

        REVISION DETAIL
        https://reviews.facebook.net/D2241

        Show
        Phabricator added a comment - sc has commented on the revision " [jira] HBASE-5560 Avoid RegionServer GC caused by timed-out calls". +1 Only have one minor comment. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java:1461 Can you put 15 * 60 * 1000L to make it easier to read? REVISION DETAIL https://reviews.facebook.net/D2241
        Hide
        Phabricator added a comment -

        dhruba updated the revision "[jira]HBASE-5560 Avoid RegionServer GC caused by timed-out calls".
        Reviewers: stack, tedyu, sc, JIRA

        Changed the constant to 15 * 6 * 1000 (insted of 9000).
        I did not change it to 5 minutes because of backward compatibility reasons.
        Lars: if u fel strongly that we should change it to 5 min, please let me know
        and I will make the change.

        REVISION DETAIL
        https://reviews.facebook.net/D2241

        AFFECTED FILES
        src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java

        Show
        Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5560 Avoid RegionServer GC caused by timed-out calls". Reviewers: stack, tedyu, sc, JIRA Changed the constant to 15 * 6 * 1000 (insted of 9000). I did not change it to 5 minutes because of backward compatibility reasons. Lars: if u fel strongly that we should change it to 5 min, please let me know and I will make the change. REVISION DETAIL https://reviews.facebook.net/D2241 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Hide
        Phabricator added a comment -

        stack has commented on the revision "[jira]HBASE-5560 Avoid RegionServer GC caused by timed-out calls".

        Excellent lads. Good one. +1 . And +1 on 90seconds rather than 15 minutes (though this patch seems to have 15minutes still in spite of your comment Dhruba).

        Reminds me of another ugly one; the queue of incoming rpc requests which was hard-coded at 100 outstanding requests * the number of handlers (So, if batch puts, could be 2MB*100*100 of unaccounted for memory when server was working hard).

        REVISION DETAIL
        https://reviews.facebook.net/D2241

        Show
        Phabricator added a comment - stack has commented on the revision " [jira] HBASE-5560 Avoid RegionServer GC caused by timed-out calls". Excellent lads. Good one. +1 . And +1 on 90seconds rather than 15 minutes (though this patch seems to have 15minutes still in spite of your comment Dhruba). Reminds me of another ugly one; the queue of incoming rpc requests which was hard-coded at 100 outstanding requests * the number of handlers (So, if batch puts, could be 2MB*100*100 of unaccounted for memory when server was working hard). REVISION DETAIL https://reviews.facebook.net/D2241
        Hide
        Lars Hofhansl added a comment -

        I think Dhruba meant 15 * 60 * 1000 and 900000

        Should we file a separate issue for the 2nd issue you mentioned, seems bad, and would be good to get into a performance release.

        Show
        Lars Hofhansl added a comment - I think Dhruba meant 15 * 60 * 1000 and 900000 Should we file a separate issue for the 2nd issue you mentioned, seems bad, and would be good to get into a performance release.
        Hide
        Lars Hofhansl added a comment -

        @Stack: The ipc queue size is already configurable (ipc.server.max.queue.size) and defaults to 10. HandlerCount also defaults to 10. So should be OK? Or did you want to put a limit on the total memory used by the queue?

        Show
        Lars Hofhansl added a comment - @Stack: The ipc queue size is already configurable (ipc.server.max.queue.size) and defaults to 10. HandlerCount also defaults to 10. So should be OK? Or did you want to put a limit on the total memory used by the queue?
        Hide
        dhruba borthakur added a comment -

        Lars is asking to change the default value to 5 min.
        Stack is asking the default value to 90 sec.
        any consensus?

        Show
        dhruba borthakur added a comment - Lars is asking to change the default value to 5 min. Stack is asking the default value to 90 sec. any consensus?
        Hide
        Lars Hofhansl added a comment -

        easiest is to just leave the 15m default

        Show
        Lars Hofhansl added a comment - easiest is to just leave the 15m default
        Hide
        dhruba borthakur added a comment -

        The problem with the 5 min default is that it gives a bad-user experience out-of-the-box. If people are ok with it, I would prefer to change it to 90 seconds as Stack proposed. Is that ok with you lars? I know it breaks backward compatibility, but I will update the release-notes in the JIRA

        Show
        dhruba borthakur added a comment - The problem with the 5 min default is that it gives a bad-user experience out-of-the-box. If people are ok with it, I would prefer to change it to 90 seconds as Stack proposed. Is that ok with you lars? I know it breaks backward compatibility, but I will update the release-notes in the JIRA
        Hide
        stack added a comment -

        Pardon me lads. Did not mean to confuse. Lars I was telling a story. It used to be a queue of 100 times the number of handlers. After we figured that, it was changed to be configurable and ten by default. On 15 minutes, that seems way too long. I'm good w/ 5 minutes but would prefer 90 seconds (which is 1.5 times the rpcTimeout – the client won't be there after 60 seconds pass... should we hold on for some factor times rpcTimeout?).

        Show
        stack added a comment - Pardon me lads. Did not mean to confuse. Lars I was telling a story. It used to be a queue of 100 times the number of handlers. After we figured that, it was changed to be configurable and ten by default. On 15 minutes, that seems way too long. I'm good w/ 5 minutes but would prefer 90 seconds (which is 1.5 times the rpcTimeout – the client won't be there after 60 seconds pass... should we hold on for some factor times rpcTimeout?).
        Hide
        Lars Hofhansl added a comment -

        Heh...
        +1 on 90s timeout.

        Show
        Lars Hofhansl added a comment - Heh... +1 on 90s timeout.
        Hide
        dhruba borthakur added a comment -

        The default rpc client timeout is Max_Value, see HConstants.DEFAULT_HBASE_CLIENT_OPERATION_TIMEOUT. where do you see that rpcTimeout is ti be 60 seconds?

        Show
        dhruba borthakur added a comment - The default rpc client timeout is Max_Value, see HConstants.DEFAULT_HBASE_CLIENT_OPERATION_TIMEOUT. where do you see that rpcTimeout is ti be 60 seconds?
        Hide
        Phabricator added a comment -

        dhruba updated the revision "[jira]HBASE-5560 Avoid RegionServer GC caused by timed-out calls".
        Reviewers: stack, tedyu, sc, JIRA

        Changed default purge-timeout to 90 seconds

        REVISION DETAIL
        https://reviews.facebook.net/D2241

        AFFECTED FILES
        src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java

        Show
        Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5560 Avoid RegionServer GC caused by timed-out calls". Reviewers: stack, tedyu, sc, JIRA Changed default purge-timeout to 90 seconds REVISION DETAIL https://reviews.facebook.net/D2241 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Hide
        stack added a comment -

        @Dhruba

        In trunk:

        src/main/java/org/apache/hadoop/hbase/HConstants.java:  public static int DEFAULT_HBASE_RPC_TIMEOUT = 60000;
        
        ...
        
        In HConnectionManager.....
        
              this.rpcTimeout = conf.getInt(
                  HConstants.HBASE_RPC_TIMEOUT_KEY,
                  HConstants.DEFAULT_HBASE_RPC_TIMEOUT);
        
        
        
        Show
        stack added a comment - @Dhruba In trunk: src/main/java/org/apache/hadoop/hbase/HConstants.java: public static int DEFAULT_HBASE_RPC_TIMEOUT = 60000; ... In HConnectionManager..... this .rpcTimeout = conf.getInt( HConstants.HBASE_RPC_TIMEOUT_KEY, HConstants.DEFAULT_HBASE_RPC_TIMEOUT);
        Hide
        Phabricator added a comment -

        dhruba updated the revision "[jira]HBASE-5560 Avoid RegionServer GC caused by timed-out calls".
        Reviewers: stack, tedyu, sc, JIRA

        Changed the default timeout to be 2 times the HConstants.DEFAULT_HBASE_RPC_TIMEOUT

        REVISION DETAIL
        https://reviews.facebook.net/D2241

        AFFECTED FILES
        src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java

        Show
        Phabricator added a comment - dhruba updated the revision " [jira] HBASE-5560 Avoid RegionServer GC caused by timed-out calls". Reviewers: stack, tedyu, sc, JIRA Changed the default timeout to be 2 times the HConstants.DEFAULT_HBASE_RPC_TIMEOUT REVISION DETAIL https://reviews.facebook.net/D2241 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Hide
        Phabricator added a comment -

        lhofhansl has accepted the revision "[jira]HBASE-5560 Avoid RegionServer GC caused by timed-out calls".

        lgtm

        REVISION DETAIL
        https://reviews.facebook.net/D2241

        BRANCH
        svn

        Show
        Phabricator added a comment - lhofhansl has accepted the revision " [jira] HBASE-5560 Avoid RegionServer GC caused by timed-out calls". lgtm REVISION DETAIL https://reviews.facebook.net/D2241 BRANCH svn
        Hide
        Lars Hofhansl added a comment -

        This just needs to be committed, no?

        Show
        Lars Hofhansl added a comment - This just needs to be committed, no?
        Hide
        stack added a comment -

        Applied trunk, 0.92 and 0.94 branches. Thanks for patch Dhruba.

        Show
        stack added a comment - Applied trunk, 0.92 and 0.94 branches. Thanks for patch Dhruba.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2691 (See https://builds.apache.org/job/HBase-TRUNK/2691/)
        HBASE-5560 Avoid RegionServer GC caused by timed-out calls (Revision 1303512)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2691 (See https://builds.apache.org/job/HBase-TRUNK/2691/ ) HBASE-5560 Avoid RegionServer GC caused by timed-out calls (Revision 1303512) Result = SUCCESS stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #45 (See https://builds.apache.org/job/HBase-0.94/45/)
        HBASE-5560 Avoid RegionServer GC caused by timed-out calls (Revision 1303513)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #45 (See https://builds.apache.org/job/HBase-0.94/45/ ) HBASE-5560 Avoid RegionServer GC caused by timed-out calls (Revision 1303513) Result = SUCCESS stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #333 (See https://builds.apache.org/job/HBase-0.92/333/)
        HBASE-5560 Avoid RegionServer GC caused by timed-out calls (Revision 1303515)
        HBASE-5560 Avoid RegionServer GC caused by timed-out calls (Revision 1303514)

        Result = FAILURE
        stack :
        Files :

        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java

        stack :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        Show
        Hudson added a comment - Integrated in HBase-0.92 #333 (See https://builds.apache.org/job/HBase-0.92/333/ ) HBASE-5560 Avoid RegionServer GC caused by timed-out calls (Revision 1303515) HBASE-5560 Avoid RegionServer GC caused by timed-out calls (Revision 1303514) Result = FAILURE stack : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java stack : Files : /hbase/branches/0.92/CHANGES.txt
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #145 (See https://builds.apache.org/job/HBase-TRUNK-security/145/)
        HBASE-5560 Avoid RegionServer GC caused by timed-out calls (Revision 1303512)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #145 (See https://builds.apache.org/job/HBase-TRUNK-security/145/ ) HBASE-5560 Avoid RegionServer GC caused by timed-out calls (Revision 1303512) Result = SUCCESS stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Hide
        Phabricator added a comment -

        dhruba has closed the revision "[jira]HBASE-5560 Avoid RegionServer GC caused by timed-out calls".

        REVISION DETAIL
        https://reviews.facebook.net/D2241

        To: stack, tedyu, sc, JIRA, lhofhansl, dhruba

        Show
        Phabricator added a comment - dhruba has closed the revision " [jira] HBASE-5560 Avoid RegionServer GC caused by timed-out calls". REVISION DETAIL https://reviews.facebook.net/D2241 To: stack, tedyu, sc, JIRA, lhofhansl, dhruba

          People

          • Assignee:
            dhruba borthakur
            Reporter:
            dhruba borthakur
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development