HBase
  1. HBase
  2. HBASE-5190

Limit the IPC queue size based on calls' payload size

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.90.5
    • Fix Version/s: 0.94.0, 0.95.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Each region server can only have 1GB worth of calls's payload in flight and will refuse new calls when above that limit (those get automatically retried on the client side). Also, ipc.server.max.queue.size is now called ipc.server.max.callqueue.length

      Description

      Currently we limit the number of calls in the IPC queue only on their count. It used to be really high and was dropped down recently to num_handlers * 10 (so 100 by default) because it was easy to OOME yourself when huge calls were being queued. It's still possible to hit this problem if you use really big values and/or a lot of handlers, so the idea is that we should take into account the payload size. I can see 3 solutions:

      • Do the accounting outside of the queue itself for all calls coming in and out and when a call doesn't fit, throw a retryable exception.
      • Same accounting but instead block the call when it comes in until space is made available.
      • Add a new parameter for the maximum size (in bytes) of a Call and then set the size the IPC queue (in terms of the number of items) so that it could only contain as many items as some predefined maximum size (in bytes) for the whole queue.
      1. HBASE-5190-v3.patch
        7 kB
        Jean-Daniel Cryans
      2. HBASE-5190-v2.patch
        6 kB
        Jean-Daniel Cryans
      3. HBASE-5190.patch
        6 kB
        Jean-Daniel Cryans
      4. 5190.addendum
        2 kB
        Ted Yu

        Activity

        Hide
        stack added a comment -

        Talking w/ J-D on backchannel, #2 is bad (blocking is what we currently do too so its bad toooooo).

        If we block, its likely client can timeout. If it times out, it will redo the request adding a new payload to the serverside ipc queue. Now queue has same load twice (bad if ICV). Server doesn't know client timed out till it goes to reply and the client is no longer there so it can double-count or double enter value.

        Show
        stack added a comment - Talking w/ J-D on backchannel, #2 is bad (blocking is what we currently do too so its bad toooooo). If we block, its likely client can timeout. If it times out, it will redo the request adding a new payload to the serverside ipc queue. Now queue has same load twice (bad if ICV). Server doesn't know client timed out till it goes to reply and the client is no longer there so it can double-count or double enter value.
        Hide
        Jean-Daniel Cryans added a comment -

        About #3.

        I like the elegance of that solution since we don't have to keep track of the calls in flight but I see 2 big issues:

        • if you set a max call size you need to keep both clients and servers in sink and also decide who's going to do the check.
        • if you plan for big calls by default, you may end up with a tiny size for the queue. For example, let's say you cap calls at 10% of the heap and set their max individual size at 10MB, it means that you can only allow 10 items in the queue (and you don't account listeners).
        Show
        Jean-Daniel Cryans added a comment - About #3. I like the elegance of that solution since we don't have to keep track of the calls in flight but I see 2 big issues: if you set a max call size you need to keep both clients and servers in sink and also decide who's going to do the check. if you plan for big calls by default, you may end up with a tiny size for the queue. For example, let's say you cap calls at 10% of the heap and set their max individual size at 10MB, it means that you can only allow 10 items in the queue (and you don't account listeners).
        Hide
        Jean-Daniel Cryans added a comment -

        About #1.

        The issue with this is that by throwing the call back to the client you generate a lot more IO and CPU (serializing, deserializing) and there's the possibility the of starving those clients that have bigger calls.

        Show
        Jean-Daniel Cryans added a comment - About #1. The issue with this is that by throwing the call back to the client you generate a lot more IO and CPU (serializing, deserializing) and there's the possibility the of starving those clients that have bigger calls.
        Hide
        Jean-Daniel Cryans added a comment -

        This patch is what I've been testing, it's not polished yet. I implemented the first solution.

        DEFAULT_MAX_QUEUE_SIZE_PER_HANDLER gets renamed to DEFAULT_MAX_CALLQUEUE_LENGTH_PER_HANDLER in order to add DEFAULT_MAX_CALLQUEUE_SIZE which is the max size for all calls that are in flight (now that I write this I think I need to change that name too hehe).

        An AtomicLong is responsible to keep track of the total size. It could be a bottleneck, I haven't done any checks in that regard, but if it becomes one it could be lessen by not taking into account small calls (like <32KB).

        We keep into account the call size from the moment it's inserted into the queue up to when the call is done. I considered doing that after the call is taken off the queue but then it's still easy to OOME yourself if many big calls come at the same time and handlers are free.

        When a call is rejected it's being retried on the client side.

        Show
        Jean-Daniel Cryans added a comment - This patch is what I've been testing, it's not polished yet. I implemented the first solution. DEFAULT_MAX_QUEUE_SIZE_PER_HANDLER gets renamed to DEFAULT_MAX_CALLQUEUE_LENGTH_PER_HANDLER in order to add DEFAULT_MAX_CALLQUEUE_SIZE which is the max size for all calls that are in flight (now that I write this I think I need to change that name too hehe). An AtomicLong is responsible to keep track of the total size. It could be a bottleneck, I haven't done any checks in that regard, but if it becomes one it could be lessen by not taking into account small calls (like <32KB). We keep into account the call size from the moment it's inserted into the queue up to when the call is done. I considered doing that after the call is taken off the queue but then it's still easy to OOME yourself if many big calls come at the same time and handlers are free. When a call is rejected it's being retried on the client side.
        Hide
        Ted Yu added a comment -

        Can we utilize Cliff Click Counter to reduce contention ?

        Show
        Ted Yu added a comment - Can we utilize Cliff Click Counter to reduce contention ?
        Hide
        Todd Lipcon added a comment -

        I haven't looked at the patch yet (sorry) but it seems from the discussion that this is trying to do the limiting at queue level. If instead, we did the limiting where we read from the network socket, then we'd have an automatic backpressure mechanism - clients would block on send, and the memory usage would be pushed to the client side TCP buffers rather than the HBase heap.

        The downside is, of course, that META requests and such can get blocked behind others. But that can be solved other ways as well (eg separate TCP ports for meta regions or somesuch)

        Show
        Todd Lipcon added a comment - I haven't looked at the patch yet (sorry) but it seems from the discussion that this is trying to do the limiting at queue level. If instead, we did the limiting where we read from the network socket, then we'd have an automatic backpressure mechanism - clients would block on send, and the memory usage would be pushed to the client side TCP buffers rather than the HBase heap. The downside is, of course, that META requests and such can get blocked behind others. But that can be solved other ways as well (eg separate TCP ports for meta regions or somesuch)
        Hide
        Ted Yu added a comment -
        -      this.priorityCallQueue = new LinkedBlockingQueue<Call>(maxQueueSize); // TODO hack on size
        +      this.priorityCallQueue = new LinkedBlockingQueue<Call>(maxQueueLength); // TODO hack on size
        

        The TODO above can be dropped.

        Todd's comment makes sense. But since it requires more work, we can checkin this patch for now.

        Show
        Ted Yu added a comment - - this .priorityCallQueue = new LinkedBlockingQueue<Call>(maxQueueSize); // TODO hack on size + this .priorityCallQueue = new LinkedBlockingQueue<Call>(maxQueueLength); // TODO hack on size The TODO above can be dropped. Todd's comment makes sense. But since it requires more work, we can checkin this patch for now.
        Hide
        Jean-Daniel Cryans added a comment -

        The TODO above can be dropped

        My understanding of that todo is that it shouldn't be setting as many PRI as there's normal handlers, so it still stands.

        Todd's comment makes sense

        I agree.

        But since it requires more work, we can checkin this patch for now.

        I disagree, I'll investigate his idea and I'm in no hurry to check this in since it's targeted for 0.94

        Show
        Jean-Daniel Cryans added a comment - The TODO above can be dropped My understanding of that todo is that it shouldn't be setting as many PRI as there's normal handlers, so it still stands. Todd's comment makes sense I agree. But since it requires more work, we can checkin this patch for now. I disagree, I'll investigate his idea and I'm in no hurry to check this in since it's targeted for 0.94
        Hide
        Lars Hofhansl added a comment -

        Wanna finish this for 0.94, J-D? Or should bump to 0.96?

        Show
        Lars Hofhansl added a comment - Wanna finish this for 0.94, J-D? Or should bump to 0.96?
        Hide
        Jean-Daniel Cryans added a comment -

        The current patch works, I've tested it extensively through massive imports. Current concerns:

        • I haven't done a performance comparison, like is it going to slow down traffic because of additional checks? Most of my testing was done so that I'm hitting the limit all the time, so that does definitely slow down my throughput but it's expected
        • The exception "Call queue already full" doesn't make it to the client, what happens is that it's being printed server-side and the client gets an EOF. That's bad.
        • What default should we use? In my testing I saw that 100MB might be too small, but ideally that needs to scale with the amount of memory.

        I don't mind finishing this for 0.94 if there's demand/motivation for it.

        Show
        Jean-Daniel Cryans added a comment - The current patch works, I've tested it extensively through massive imports. Current concerns: I haven't done a performance comparison, like is it going to slow down traffic because of additional checks? Most of my testing was done so that I'm hitting the limit all the time, so that does definitely slow down my throughput but it's expected The exception "Call queue already full" doesn't make it to the client, what happens is that it's being printed server-side and the client gets an EOF. That's bad. What default should we use? In my testing I saw that 100MB might be too small, but ideally that needs to scale with the amount of memory. I don't mind finishing this for 0.94 if there's demand/motivation for it.
        Hide
        Lars Hofhansl added a comment -

        I doubt you'll see a slowdown from this. Then again AtomicLong needs a memory barrier as far as I know, so could potentially add a noticeable slowdown. Hmm...

        100MB is per RegionServer, right? Does seem a bit small. Maybe 1G?

        We really need a streaming API for large rows, but that's a different story.

        I take it that maybe this is something to consider for 0.96. Agreed?

        Show
        Lars Hofhansl added a comment - I doubt you'll see a slowdown from this. Then again AtomicLong needs a memory barrier as far as I know, so could potentially add a noticeable slowdown. Hmm... 100MB is per RegionServer, right? Does seem a bit small. Maybe 1G? We really need a streaming API for large rows, but that's a different story. I take it that maybe this is something to consider for 0.96. Agreed?
        Hide
        Jean-Daniel Cryans added a comment -

        100MB is per RegionServer, right? Does seem a bit small. Maybe 1G?

        Might be a good default, those with the default heap will definitely not get any help here though.

        I take it that maybe this is something to consider for 0.96. Agreed?

        The more I think about it, the more I want this in 0.94 because it can really give us a better understanding of those issues we see on the mailing list.

        Show
        Jean-Daniel Cryans added a comment - 100MB is per RegionServer, right? Does seem a bit small. Maybe 1G? Might be a good default, those with the default heap will definitely not get any help here though. I take it that maybe this is something to consider for 0.96. Agreed? The more I think about it, the more I want this in 0.94 because it can really give us a better understanding of those issues we see on the mailing list.
        Hide
        Lars Hofhansl added a comment -

        Fair enough. I think we don't need to worry about a slow down here.

        So that leaves passing the exception on to the client.
        How is the exception handled currently when we get over the queue length, if the client in that case also gets an EOF then we can live with that.

        Show
        Lars Hofhansl added a comment - Fair enough. I think we don't need to worry about a slow down here. So that leaves passing the exception on to the client. How is the exception handled currently when we get over the queue length, if the client in that case also gets an EOF then we can live with that.
        Hide
        Jean-Daniel Cryans added a comment -

        Instead of throwing the IOE it's better if I put it in the try block just below in the code as it already handles the setting up of a response. I just need to change the messaging there and refactor some bits.

        Show
        Jean-Daniel Cryans added a comment - Instead of throwing the IOE it's better if I put it in the try block just below in the code as it already handles the setting up of a response. I just need to change the messaging there and refactor some bits.
        Hide
        Jean-Daniel Cryans added a comment -

        New patch: maximum queue size is bumped to 1GB, the exception is now sent back to the client.

        Show
        Jean-Daniel Cryans added a comment - New patch: maximum queue size is bumped to 1GB, the exception is now sent back to the client.
        Hide
        Ted Yu added a comment -

        Should we use high_scale_lib for the callQueueSize ?

               if (LOG.isDebugEnabled())
        -        LOG.debug(" got call #" + id + ", " + buf.length + " bytes");
        +        LOG.debug(" got call #" + id + ", " + callSize + " bytes");
        

        Please wrap the log with braces.

        +            IOException.class.getName(),
        +            "Call queue is full, retry later");
        

        Would giving some details about queue capacity help client make better decision ?

        +    // TODO add a deprecated comment for ipc.server.max.queue.size
        

        Should the above comment be added to 0.90.7 ?

        Show
        Ted Yu added a comment - Should we use high_scale_lib for the callQueueSize ? if (LOG.isDebugEnabled()) - LOG.debug( " got call #" + id + ", " + buf.length + " bytes" ); + LOG.debug( " got call #" + id + ", " + callSize + " bytes" ); Please wrap the log with braces. + IOException.class.getName(), + "Call queue is full, retry later" ); Would giving some details about queue capacity help client make better decision ? + // TODO add a deprecated comment for ipc.server.max.queue.size Should the above comment be added to 0.90.7 ?
        Hide
        stack added a comment -

        +1 after chatting w/ J-D about what happens when we fill the queue (we send client IOE, retries, fills network w/ more back and forths but it should act like a pushback on the client...)

        Show
        stack added a comment - +1 after chatting w/ J-D about what happens when we fill the queue (we send client IOE, retries, fills network w/ more back and forths but it should act like a pushback on the client...)
        Hide
        Jean-Daniel Cryans added a comment -

        New patch. I added some code (that bends over backward) to support the old ipc.server.max.queue.size which I'm planning to only add to 0.94, trunk will just read from the new config.

        Should we use high_scale_lib for the callQueueSize ?

        Added that.

        Please wrap the log with braces.

        Seems superfluous, anyways I changed that too.

        Would giving some details about queue capacity help client make better decision ?

        I fixed the comment a bit. I don't want to be too chatty in the log, we can add more in the book.

        Should the above comment be added to 0.90.7 ?

        So I cleared that out. We could add a warning in 0.90 and 0.92 yeah.

        Show
        Jean-Daniel Cryans added a comment - New patch. I added some code (that bends over backward) to support the old ipc.server.max.queue.size which I'm planning to only add to 0.94, trunk will just read from the new config. Should we use high_scale_lib for the callQueueSize ? Added that. Please wrap the log with braces. Seems superfluous, anyways I changed that too. Would giving some details about queue capacity help client make better decision ? I fixed the comment a bit. I don't want to be too chatty in the log, we can add more in the book. Should the above comment be added to 0.90.7 ? So I cleared that out. We could add a warning in 0.90 and 0.92 yeah.
        Hide
        Lars Hofhansl added a comment -

        +1 on v3

        Show
        Lars Hofhansl added a comment - +1 on v3
        Hide
        Lars Hofhansl added a comment -

        Actually, can we new increase the default queue length?
        For small requests 10 is awfully small.

        Show
        Lars Hofhansl added a comment - Actually, can we new increase the default queue length? For small requests 10 is awfully small.
        Hide
        stack added a comment -

        Actually, can we new increase the default queue length?

        This is a good suggestion

        Show
        stack added a comment - Actually, can we new increase the default queue length? This is a good suggestion
        Hide
        Jean-Daniel Cryans added a comment -

        It's not exactly 10, it's num_handlers * 10 so by default 100. Is that better?

        Show
        Jean-Daniel Cryans added a comment - It's not exactly 10, it's num_handlers * 10 so by default 100. Is that better?
        Hide
        stack added a comment -

        @JD Can do in another issue.

        Show
        stack added a comment - @JD Can do in another issue.
        Hide
        Lars Hofhansl added a comment -

        @J-D: Oh yeah forgot that. Fine then.
        Still +1

        Show
        Lars Hofhansl added a comment - @J-D: Oh yeah forgot that. Fine then. Still +1
        Hide
        Lars Hofhansl added a comment -

        Wanna commit, J-D?

        Show
        Lars Hofhansl added a comment - Wanna commit, J-D?
        Hide
        Jean-Daniel Cryans added a comment -

        Yep, right after I finishing running medium tests on trunk.

        Show
        Jean-Daniel Cryans added a comment - Yep, right after I finishing running medium tests on trunk.
        Hide
        Jean-Daniel Cryans added a comment -

        Committed to 0.94 and trunk, thanks for the reviews guys.

        Show
        Jean-Daniel Cryans added a comment - Committed to 0.94 and trunk, thanks for the reviews guys.
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #49 (See https://builds.apache.org/job/HBase-0.94/49/)
        HBASE-5190 Limit the IPC queue size based on calls' payload size (Revision 1304635)

        Result = SUCCESS
        jdcryans :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #49 (See https://builds.apache.org/job/HBase-0.94/49/ ) HBASE-5190 Limit the IPC queue size based on calls' payload size (Revision 1304635) Result = SUCCESS jdcryans : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2694 (See https://builds.apache.org/job/HBase-TRUNK/2694/)
        HBASE-5190 Limit the IPC queue size based on calls' payload size (Revision 1304634)

        Result = SUCCESS
        jdcryans :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2694 (See https://builds.apache.org/job/HBase-TRUNK/2694/ ) HBASE-5190 Limit the IPC queue size based on calls' payload size (Revision 1304634) Result = SUCCESS jdcryans : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        Hide
        Ted Yu added a comment -

        The new parameter to Call ctor causes secure HBase build to fail.

        Show
        Ted Yu added a comment - The new parameter to Call ctor causes secure HBase build to fail.
        Hide
        Ted Yu added a comment -

        Suggested addendum.

        @J-D:
        Please take a look.

        Show
        Ted Yu added a comment - Suggested addendum. @J-D: Please take a look.
        Hide
        Lars Hofhansl added a comment -

        +1 on addendum

        Show
        Lars Hofhansl added a comment - +1 on addendum
        Hide
        stack added a comment -

        Addendum looks good to me. +1

        Show
        stack added a comment - Addendum looks good to me. +1
        Hide
        Jean-Daniel Cryans added a comment -

        Committed the addendum, thanks for looking at this Ted. Also, sorry I forgot about security.

        Show
        Jean-Daniel Cryans added a comment - Committed the addendum, thanks for looking at this Ted. Also, sorry I forgot about security.
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #56 (See https://builds.apache.org/job/HBase-0.94/56/)
        HBASE-5190 Limit the IPC queue size based on calls' payload size
        (Ted's addendum) (Revision 1305469)

        Result = FAILURE
        jdcryans :
        Files :

        • /hbase/branches/0.94/security/src/main/java/org/apache/hadoop/hbase/ipc/SecureServer.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #56 (See https://builds.apache.org/job/HBase-0.94/56/ ) HBASE-5190 Limit the IPC queue size based on calls' payload size (Ted's addendum) (Revision 1305469) Result = FAILURE jdcryans : Files : /hbase/branches/0.94/security/src/main/java/org/apache/hadoop/hbase/ipc/SecureServer.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94-security #3 (See https://builds.apache.org/job/HBase-0.94-security/3/)
        HBASE-5190 Limit the IPC queue size based on calls' payload size
        (Ted's addendum) (Revision 1305469)

        Result = ABORTED
        jdcryans :
        Files :

        • /hbase/branches/0.94/security/src/main/java/org/apache/hadoop/hbase/ipc/SecureServer.java
        Show
        Hudson added a comment - Integrated in HBase-0.94-security #3 (See https://builds.apache.org/job/HBase-0.94-security/3/ ) HBASE-5190 Limit the IPC queue size based on calls' payload size (Ted's addendum) (Revision 1305469) Result = ABORTED jdcryans : Files : /hbase/branches/0.94/security/src/main/java/org/apache/hadoop/hbase/ipc/SecureServer.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #151 (See https://builds.apache.org/job/HBase-TRUNK-security/151/)
        HBASE-5190 Limit the IPC queue size based on calls' payload size
        (Ted's addendum) (Revision 1305468)

        Result = FAILURE
        jdcryans :
        Files :

        • /hbase/trunk/security/src/main/java/org/apache/hadoop/hbase/ipc/SecureServer.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #151 (See https://builds.apache.org/job/HBase-TRUNK-security/151/ ) HBASE-5190 Limit the IPC queue size based on calls' payload size (Ted's addendum) (Revision 1305468) Result = FAILURE jdcryans : Files : /hbase/trunk/security/src/main/java/org/apache/hadoop/hbase/ipc/SecureServer.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2697 (See https://builds.apache.org/job/HBase-TRUNK/2697/)
        HBASE-5190 Limit the IPC queue size based on calls' payload size
        (Ted's addendum) (Revision 1305468)

        Result = FAILURE
        jdcryans :
        Files :

        • /hbase/trunk/security/src/main/java/org/apache/hadoop/hbase/ipc/SecureServer.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2697 (See https://builds.apache.org/job/HBase-TRUNK/2697/ ) HBASE-5190 Limit the IPC queue size based on calls' payload size (Ted's addendum) (Revision 1305468) Result = FAILURE jdcryans : Files : /hbase/trunk/security/src/main/java/org/apache/hadoop/hbase/ipc/SecureServer.java

          People

          • Assignee:
            Jean-Daniel Cryans
            Reporter:
            Jean-Daniel Cryans
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development