HBase
  1. HBase
  2. HBASE-2939

Allow Client-Side Connection Pooling

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.89.20100621
    • Fix Version/s: 0.92.0
    • Component/s: Client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      By design, the HBase RPC client multiplexes calls to a given region server (or the master for that matter) over a single socket, access to which is managed by a connection thread defined in the HBaseClient class. While this approach may suffice for most cases, it tends to break down in the context of a real-time, multi-threaded server, where latencies need to be lower and throughputs higher.

      In brief, the problem is that we dedicate one thread to handle all client-side reads and writes for a given server, which in turn forces them to share the same socket. As load increases, this is bound to serialize calls on the client-side. In particular, when the rate at which calls are submitted to the connection thread is greater than that at which the server responds, then some of those calls will inevitably end up sitting idle, just waiting their turn to go over the wire.

      In general, sharing sockets across multiple client threads is a good idea, but limiting the number of such sockets to one may be overly restrictive for certain cases. Here, we propose a way of defining multiple sockets per server endpoint, access to which may be managed through either a load-balancing or thread-local pool. To that end, we define the notion of a SharedMap, which maps a key to a resource pool, and supports both of those pool types. Specifically, we will apply that map in the HBaseClient, to associate multiple connection threads with each server endpoint (denoted by a connection id).

      Currently, the SharedMap supports the following types of pools:

      • A ThreadLocalPool, which represents a pool that builds on the ThreadLocal class. It essentially binds the resource to the thread from which it is accessed.
      • A ReusablePool, which represents a pool that builds on the LinkedList class. It essentially allows resources to be checked out, at which point it is (temporarily) removed from the pool. When the resource is no longer required, it should be returned to the pool in order to be reused.
      • A RoundRobinPool, which represents a pool that stores its resources in an ArrayList. It load-balances access to its resources by returning a different resource every time a given key is looked up.

      To control the type and size of the connection pools, we give the user a couple of parameters (viz. "hbase.client.ipc.pool.type" and "hbase.client.ipc.pool.size"). In case the size of the pool is set to a non-zero positive number, that is used to cap the number of resources that a pool may contain for any given key. A size of Integer#MAX_VALUE is interpreted to mean an unbounded pool.

      1. HBASE-2939.patch
        19 kB
        Karthick Sankarachary
      2. HBASE-2939.patch
        28 kB
        Karthick Sankarachary
      3. HBASE-2939.patch
        19 kB
        Karthick Sankarachary
      4. HBASE-2939-0.20.6.patch
        28 kB
        Karthick Sankarachary
      5. HBASE-2939-LATEST.patch
        28 kB
        Karthick Sankarachary
      6. HBASE-2939-V6.patch
        34 kB
        Karthick Sankarachary
      7. HBaseClient.java
        31 kB
        Jon

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Hi Karthick. Have you had a chance to run any benchmarks with this patch? As is, we do use a single TCP connection to each backend region server, but we can have multiple outstanding calls, so we should really be constrained by bandwidth, not by waiting for the previous call to return.

          Show
          Todd Lipcon added a comment - Hi Karthick. Have you had a chance to run any benchmarks with this patch? As is, we do use a single TCP connection to each backend region server, but we can have multiple outstanding calls, so we should really be constrained by bandwidth, not by waiting for the previous call to return.
          Hide
          Karthick Sankarachary added a comment -

          Hi Todd,

          We observed significantly lower latencies (and consequently higher throughputs) when running our load test, especially with low think times. While it is true that we can having multiple outstanding calls, we do synchronize the send and receive legs of the call (see snippets below), and I believe that serializes the calls to some extent.

          Synchronization Of Send Request
           
          protected void sendParam(Call call) {
            ...
            synchronized (this.out) {
            ...
            }
          }
          
          Synchronization Of Receive Response
           
          public void run() {
          ...
          while (waitForWork()) {//wait here for work - read or close connection
            receiveResponse();
          }
          ...
          }
          
          Show
          Karthick Sankarachary added a comment - Hi Todd, We observed significantly lower latencies (and consequently higher throughputs) when running our load test, especially with low think times. While it is true that we can having multiple outstanding calls, we do synchronize the send and receive legs of the call (see snippets below), and I believe that serializes the calls to some extent. Synchronization Of Send Request protected void sendParam(Call call) { ... synchronized ( this .out) { ... } } Synchronization Of Receive Response public void run() { ... while (waitForWork()) { //wait here for work - read or close connection receiveResponse(); } ... }
          Hide
          ryan rawson added a comment -

          I can see how using a single thread to read and write and serialize all data
          to a single regionserver could make things slow. I +1 the general approach
          here. Thanks for doing this!

          On Aug 29, 2010 3:52 PM, "Karthick Sankarachary (JIRA)" <jira@apache.org>
          https://issues.apache.org/jira/browse/HBASE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904047#action_12904047]
          throughputs) when running our load test, especially with low think times.
          While it is true that we can having multiple outstanding calls, we do
          synchronize the send and receive legs of the call (see snippets below), and
          I believe that serializes the calls to some extent.
          server (or the master for that matter) over a single socket, access to which
          is managed by a connection thread defined in the HBaseClient class. While
          this approach may suffice for most cases, it tends to break down in the
          context of a real-time, multi-threaded server, where latencies need to be
          lower and throughputs higher.
          client-side reads and writes for a given server, which in turn forces them
          to share the same socket. As load increases, this is bound to serialize
          calls on the client-side. In particular, when the rate at which calls are
          submitted to the connection thread is greater than that at which the server
          responds, then some of those calls will inevitably end up sitting idle, just
          waiting their turn to go over the wire.
          idea, but limiting the number of such sockets to one may be overly
          restrictive for certain cases. Here, we propose a way of defining multiple
          sockets per server endpoint, access to which may be managed through either a
          load-balancing or thread-local pool. To that end, we define the notion of a
          SharedMap, which maps a key to a resource pool, and supports both of those
          pool types. Specifically, we will apply that map in the HBaseClient, to
          associate multiple connection threads with each server endpoint (denoted by
          a connection id).
          ThreadLocal class. It essentially binds the resource to the thread from
          which it is accessed.
          class. It essentially allows resources to be checked out, at which point it
          is (temporarily) removed from the pool. When the resource is no longer
          required, it should be returned to the pool in order to be reused.
          an ArrayList. It load-balances access to its resources by returning a
          different resource every time a given key is looked up.
          couple of parameters (viz. "hbase.client.ipc.pool.type" and
          "hbase.client.ipc.pool.size"). In case the size of the pool is set to a
          non-zero positive number, that is used to cap the number of resources that a
          pool may contain for any given key. A size of Integer#MAX_VALUE is
          interpreted to mean an unbounded pool.

          Show
          ryan rawson added a comment - I can see how using a single thread to read and write and serialize all data to a single regionserver could make things slow. I +1 the general approach here. Thanks for doing this! On Aug 29, 2010 3:52 PM, "Karthick Sankarachary (JIRA)" <jira@apache.org> https://issues.apache.org/jira/browse/HBASE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904047#action_12904047 ] throughputs) when running our load test, especially with low think times. While it is true that we can having multiple outstanding calls, we do synchronize the send and receive legs of the call (see snippets below), and I believe that serializes the calls to some extent. server (or the master for that matter) over a single socket, access to which is managed by a connection thread defined in the HBaseClient class. While this approach may suffice for most cases, it tends to break down in the context of a real-time, multi-threaded server, where latencies need to be lower and throughputs higher. client-side reads and writes for a given server, which in turn forces them to share the same socket. As load increases, this is bound to serialize calls on the client-side. In particular, when the rate at which calls are submitted to the connection thread is greater than that at which the server responds, then some of those calls will inevitably end up sitting idle, just waiting their turn to go over the wire. idea, but limiting the number of such sockets to one may be overly restrictive for certain cases. Here, we propose a way of defining multiple sockets per server endpoint, access to which may be managed through either a load-balancing or thread-local pool. To that end, we define the notion of a SharedMap, which maps a key to a resource pool, and supports both of those pool types. Specifically, we will apply that map in the HBaseClient, to associate multiple connection threads with each server endpoint (denoted by a connection id). ThreadLocal class. It essentially binds the resource to the thread from which it is accessed. class. It essentially allows resources to be checked out, at which point it is (temporarily) removed from the pool. When the resource is no longer required, it should be returned to the pool in order to be reused. an ArrayList. It load-balances access to its resources by returning a different resource every time a given key is looked up. couple of parameters (viz. "hbase.client.ipc.pool.type" and "hbase.client.ipc.pool.size"). In case the size of the pool is set to a non-zero positive number, that is used to cap the number of resources that a pool may contain for any given key. A size of Integer#MAX_VALUE is interpreted to mean an unbounded pool.
          Hide
          ryan rawson added a comment -

          On the thread local, do the sockets get closed when the thread goes away?

          Show
          ryan rawson added a comment - On the thread local, do the sockets get closed when the thread goes away?
          Hide
          Karthick Sankarachary added a comment -

          Actually, no, the sockets won't get closed the way it is right now. The ThreadLocalPool#values method needs to return all of the thread-local connections in it, which doesn't happen right now. Please stay tuned for an updated patch.

          Show
          Karthick Sankarachary added a comment - Actually, no, the sockets won't get closed the way it is right now. The ThreadLocalPool#values method needs to return all of the thread-local connections in it, which doesn't happen right now. Please stay tuned for an updated patch.
          Hide
          ryan rawson added a comment -

          i gave this a shot using ycsb and 50 threads of parallelism and it bit the dust and got nothing done. it also seems to be spending a lot of time expiring entries and looking them up in meta again, which is strange considering i have a 1 region table that hasnt moved.

          Show
          ryan rawson added a comment - i gave this a shot using ycsb and 50 threads of parallelism and it bit the dust and got nothing done. it also seems to be spending a lot of time expiring entries and looking them up in meta again, which is strange considering i have a 1 region table that hasnt moved.
          Hide
          ryan rawson added a comment -

          ok looks like i hit FD limits on my machines (1024, doh!).

          Show
          ryan rawson added a comment - ok looks like i hit FD limits on my machines (1024, doh!).
          Hide
          ryan rawson added a comment -

          another issue testing this patch - with the config set to use roundrobin and a limited number of threads (20,50), the client won't exist normally anymore. The HCM shutdown hook won't finish up.

          Show
          ryan rawson added a comment - another issue testing this patch - with the config set to use roundrobin and a limited number of threads (20,50), the client won't exist normally anymore. The HCM shutdown hook won't finish up.
          Hide
          ryan rawson added a comment -

          more feedback:

          With threadlocal I get npes on client shutdown:
          Exception in thread "HCM.shutdownHook" java.lang.NullPointerException
          at org.apache.hadoop.hbase.ipc.HBaseClient.stop(HBaseClient.java:724)
          at org.apache.hadoop.hbase.ipc.HBaseRPC$ClientCache.stopClient(HBaseRPC.java:219)
          at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.close(HBaseRPC.java:265)
          at org.apache.hadoop.hbase.ipc.HBaseRPC.stopProxy(HBaseRPC.java:441)
          at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.close(HConnectionManager.java:1386)
          at org.apache.hadoop.hbase.client.HConnectionManager.deleteAllConnections(HConnectionManager.java:156)
          at org.apache.hadoop.hbase.client.HConnectionManager$1.run(HConnectionManager.java:83)

          Without an explicit pool type configuration, something BAD happens and the client seems to unlimitedly spawn sockets and takes down my low socket regionservers. Setting the config to threadlocal fixes it.

          And last but not least, the speed of round robin is about as fast as without the patch if not slower even. The threadlocal is much faster (20%+) than the pre-patch version.

          Show
          ryan rawson added a comment - more feedback: With threadlocal I get npes on client shutdown: Exception in thread "HCM.shutdownHook" java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.HBaseClient.stop(HBaseClient.java:724) at org.apache.hadoop.hbase.ipc.HBaseRPC$ClientCache.stopClient(HBaseRPC.java:219) at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.close(HBaseRPC.java:265) at org.apache.hadoop.hbase.ipc.HBaseRPC.stopProxy(HBaseRPC.java:441) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.close(HConnectionManager.java:1386) at org.apache.hadoop.hbase.client.HConnectionManager.deleteAllConnections(HConnectionManager.java:156) at org.apache.hadoop.hbase.client.HConnectionManager$1.run(HConnectionManager.java:83) Without an explicit pool type configuration, something BAD happens and the client seems to unlimitedly spawn sockets and takes down my low socket regionservers. Setting the config to threadlocal fixes it. And last but not least, the speed of round robin is about as fast as without the patch if not slower even. The threadlocal is much faster (20%+) than the pre-patch version.
          Hide
          Karthick Sankarachary added a comment -

          Ryan,

          Thanks for the feedback - it is much appreciated. Some of the concerns you raised should be addressed by the second version of the patch which:

          • Modifies the "threadlocal" pool's #values method so that it return all connections across all threads, instead of the one local to the current thread that could potentially be null, which should resolve the NPEs on client shutdown.
          • Defaults the pool type to a bounded "roundrobin" of size 1, which would emulate pre-patch behavior. On the other hand, the "threadlocal" pool's size is bounded only by the clients' thread pool size.

          Additionally, it:

          • Renames the SharedMap class to PoolMap, for lack of a better name.
          • Adds a test case for the PoolMap, which demonstrates how pools of different types and sizes behave under different loads.
          • Constrains the value for "hbase.client.ipc.pool.type" to one of the following: "roundrobin", and "threadlocal".

          To your point about file descriptor limits, the prescribed workaround for that is described here. We found out about that limit the hard way too.

          The 20% speed up in "threadlocal" is good, although it may be even better if the benchmark runs at high throughputs. The "roundrobin" pool should not be unbounded, which is no longer the default - in general, it should be somewhere between 1 and the number of user-defined threads.

          Show
          Karthick Sankarachary added a comment - Ryan, Thanks for the feedback - it is much appreciated. Some of the concerns you raised should be addressed by the second version of the patch which: Modifies the "threadlocal" pool's #values method so that it return all connections across all threads, instead of the one local to the current thread that could potentially be null, which should resolve the NPEs on client shutdown. Defaults the pool type to a bounded "roundrobin" of size 1, which would emulate pre-patch behavior. On the other hand, the "threadlocal" pool's size is bounded only by the clients' thread pool size. Additionally, it: Renames the SharedMap class to PoolMap, for lack of a better name. Adds a test case for the PoolMap, which demonstrates how pools of different types and sizes behave under different loads. Constrains the value for "hbase.client.ipc.pool.type" to one of the following: "roundrobin", and "threadlocal". To your point about file descriptor limits, the prescribed workaround for that is described here . We found out about that limit the hard way too. The 20% speed up in "threadlocal" is good, although it may be even better if the benchmark runs at high throughputs. The "roundrobin" pool should not be unbounded, which is no longer the default - in general, it should be somewhere between 1 and the number of user-defined threads.
          Hide
          Tao Xie added a comment -

          I'm using 0.20.6 version, I checked the patch and found it just modifies the one connection thread to a pool strategy so I applied it to my 0.20.6 hbase code. But now master cannot recognizes RegionServers. Maybe this patch can only be applied to 0.89 version?

          HBase Master log:

          2010-10-15 17:10:18,225 INFO org.apache.hadoop.hbase.master.BaseScanner: All 0 .META. region(s) scanned
          2010-10-15 17:11:18,069 INFO org.apache.hadoop.hbase.master.ServerManager: 0 region servers, 0 dead, average load NaN
          2010-10-15 17:11:18,229 INFO org.apache.hadoop.hbase.master.BaseScanner: All 0 .META. region(s) scanned
          2010-10-15 17:12:18,073 INFO org.apache.hadoop.hbase.master.ServerManager: 0 region servers, 0 dead, average load NaN
          2010-10-15 17:12:18,233 INFO org.apache.hadoop.hbase.master.BaseScanner: All 0 .META. region(s) scanned

          Show
          Tao Xie added a comment - I'm using 0.20.6 version, I checked the patch and found it just modifies the one connection thread to a pool strategy so I applied it to my 0.20.6 hbase code. But now master cannot recognizes RegionServers. Maybe this patch can only be applied to 0.89 version? HBase Master log: 2010-10-15 17:10:18,225 INFO org.apache.hadoop.hbase.master.BaseScanner: All 0 .META. region(s) scanned 2010-10-15 17:11:18,069 INFO org.apache.hadoop.hbase.master.ServerManager: 0 region servers, 0 dead, average load NaN 2010-10-15 17:11:18,229 INFO org.apache.hadoop.hbase.master.BaseScanner: All 0 .META. region(s) scanned 2010-10-15 17:12:18,073 INFO org.apache.hadoop.hbase.master.ServerManager: 0 region servers, 0 dead, average load NaN 2010-10-15 17:12:18,233 INFO org.apache.hadoop.hbase.master.BaseScanner: All 0 .META. region(s) scanned
          Hide
          Karthick Sankarachary added a comment -

          Hi Tao,

          Judging by the master log messages, it seems that your region server has not started.

          AFAIK, this patch should work with the 0.20.6 branch as well (note that it was originally based on the trunk). To validate that, I back-ported the patch to the 0.20.6 tag (see HBASE-2939-0.20.6.patch) , packaged it, started hbase, and sure enough it starts (see status below).

          hbase(main):001:0> status
          1 servers, 0 dead, 2.0000 average load
          hbase(main):002:0> version
          Version: 0.20.6, r1609fbbdc0e76921eba3dfaeffd140a3606c4da0, Tue Oct 19 22:53:37 EDT 2010
          hbase(main):003:0>

          Regards,
          Karthick

          Show
          Karthick Sankarachary added a comment - Hi Tao, Judging by the master log messages, it seems that your region server has not started. AFAIK, this patch should work with the 0.20.6 branch as well (note that it was originally based on the trunk). To validate that, I back-ported the patch to the 0.20.6 tag (see HBASE-2939 -0.20.6.patch) , packaged it, started hbase, and sure enough it starts (see status below). hbase(main):001:0> status 1 servers, 0 dead, 2.0000 average load hbase(main):002:0> version Version: 0.20.6, r1609fbbdc0e76921eba3dfaeffd140a3606c4da0, Tue Oct 19 22:53:37 EDT 2010 hbase(main):003:0> Regards, Karthick
          Hide
          stack added a comment -

          Making this issue critical – We owe Karthik feedback – and assigning Ryan since he was looking into this (Can you take a look RR? Karthik updated his patch... thanks).

          Show
          stack added a comment - Making this issue critical – We owe Karthik feedback – and assigning Ryan since he was looking into this (Can you take a look RR? Karthik updated his patch... thanks).
          Hide
          stack added a comment -

          Oh, I brought it into 0.92 too.

          Show
          stack added a comment - Oh, I brought it into 0.92 too.
          Hide
          ryan rawson added a comment -

          btw the recent patch applies with no major issues, some import jazz but intellij fixes that.

          I'm working on perf testing this, but I'm having problems with my test env right now.

          Show
          ryan rawson added a comment - btw the recent patch applies with no major issues, some import jazz but intellij fixes that. I'm working on perf testing this, but I'm having problems with my test env right now.
          Hide
          Karthick Sankarachary added a comment -

          Thanks for the update. Please let me know how your benchmarking goes.

          Show
          Karthick Sankarachary added a comment - Thanks for the update. Please let me know how your benchmarking goes.
          Hide
          ryan rawson added a comment -

          I ran this on a little cluster test and my results were a little mixed. This is due to my set up and my benchmarking setup.

          • using YCSB and a small working set I loaded data on Regionserver R1
          • ran YCSB with small 300 row scans with 100 columns on Master M
          • ran with both this patch, and without.

          The test runs with this patch did not indicate significant improvement. I think this was due to the fact that I was saturating the network between M and R1, and opening more sockets gave slight but not significant improvement. I didn't capture the numbers, but it was something like 390 ms without and 340 ms with.

          I ran in a fairly interesting case since I was trying to test the contention of the single socket, and it seems like adding more sockets to a saturated network did not improve like I had hoped to.

          Could you paste in your test scenario? I could see that for some scenarios involving small to medium gets, that this could provide an improvement to latency.

          Show
          ryan rawson added a comment - I ran this on a little cluster test and my results were a little mixed. This is due to my set up and my benchmarking setup. using YCSB and a small working set I loaded data on Regionserver R1 ran YCSB with small 300 row scans with 100 columns on Master M ran with both this patch, and without. The test runs with this patch did not indicate significant improvement. I think this was due to the fact that I was saturating the network between M and R1, and opening more sockets gave slight but not significant improvement. I didn't capture the numbers, but it was something like 390 ms without and 340 ms with. I ran in a fairly interesting case since I was trying to test the contention of the single socket, and it seems like adding more sockets to a saturated network did not improve like I had hoped to. Could you paste in your test scenario? I could see that for some scenarios involving small to medium gets, that this could provide an improvement to latency.
          Hide
          stack added a comment -

          Moving out of 0.92.

          Show
          stack added a comment - Moving out of 0.92.
          Hide
          Jon added a comment -

          Attached is the client code we developed in house. This helped alleviate the HTable bottleneck when fetching 10K+ rows from HBase.

          It's a bit of a hack at the moment.

          Show
          Jon added a comment - Attached is the client code we developed in house. This helped alleviate the HTable bottleneck when fetching 10K+ rows from HBase. It's a bit of a hack at the moment.
          Hide
          Karthick Sankarachary added a comment -

          Hello Jon,

          What branch is your client code based on? It shows a lot of changes w.r.t the trunk, so I take it you're working off of an older version?

          Regards,
          Karthick

          Show
          Karthick Sankarachary added a comment - Hello Jon, What branch is your client code based on? It shows a lot of changes w.r.t the trunk, so I take it you're working off of an older version? Regards, Karthick
          Hide
          Jon added a comment -

          Hi Karthick,

          We are running CDH3B3.

          Show
          Jon added a comment - Hi Karthick, We are running CDH3B3.
          Hide
          Karthick Sankarachary added a comment -

          Hi Jon,

          It looks like your patch hard codes the pool type to be round-robin and the pool size to be 50, which seems reasonable to me. Also, I noticed that you're removing the connection in the event it is marked for closure, which seems reasonable as well. The latter change doesn't seem to exist in the 0.89 branch, but perhaps it came from CDH3B3 distro.

              private synchronized void markClosed(IOException e) {
                if (shouldCloseConnection.compareAndSet(false, true)) {
          +    	  connections.remove(this.remoteId, this);
          

          In any case, your findings do corroborate the theory that the default singleton (or multiton) client connection model may break down for high throughput scenarios. I maintain that, when connection-level multiplexing starts to become a bottleneck, it is best to load-balance requests across multiple (non-overlapping) connection sockets.

          Regards,
          Karthick

          Regards,
          Karthick

          Show
          Karthick Sankarachary added a comment - Hi Jon, It looks like your patch hard codes the pool type to be round-robin and the pool size to be 50, which seems reasonable to me. Also, I noticed that you're removing the connection in the event it is marked for closure, which seems reasonable as well. The latter change doesn't seem to exist in the 0.89 branch, but perhaps it came from CDH3B3 distro. private synchronized void markClosed(IOException e) { if (shouldCloseConnection.compareAndSet( false , true )) { + connections.remove( this .remoteId, this ); In any case, your findings do corroborate the theory that the default singleton (or multiton) client connection model may break down for high throughput scenarios. I maintain that, when connection-level multiplexing starts to become a bottleneck, it is best to load-balance requests across multiple (non-overlapping) connection sockets. Regards, Karthick Regards, Karthick
          Hide
          stack added a comment -

          Bringing into 0.92.0 since Karthik got support for his claim from photobucket jon. How do you suggest we proceed Karthik? Clean up Jons patch or go forward with yours. What about Ryans' query about how you are testing your patch and what you are seeing? Thanks.

          Show
          stack added a comment - Bringing into 0.92.0 since Karthik got support for his claim from photobucket jon. How do you suggest we proceed Karthik? Clean up Jons patch or go forward with yours. What about Ryans' query about how you are testing your patch and what you are seeing? Thanks.
          Hide
          Karthick Sankarachary added a comment -

          I believe Jon's patch is based more or less on the one dated "03/Sep/10 21:14", and hard codes the pool type to "round-robin" and size to "50". In general, I think this change ought to be backward compatible w.r.t. the existing behavior, which is to map a given ConnectionId to a unique Connection. To that end, we should default the pool type to "round-robin" and its size to 1. At any rate, considering that my patch is more generic, I vote for going forward with that, with these changes (see patch attached today):

          a) Change the default pool size to 1.
          b) Clean up all trailing white spaces.

          About Ryan's query, I had a hard time rewriting the YCSB workload in terms of our (homegrown) test scenario, which involved sending 5-15K HTable.get per second. The problem with the existing YCSB workload is that it assumes a scan, hence hard to refactor. At a minimum, I am going to run the "mvn test" suite and make sure no regressions were introduced, before I take another shot at YCSB, time permitting.

          Show
          Karthick Sankarachary added a comment - I believe Jon's patch is based more or less on the one dated "03/Sep/10 21:14", and hard codes the pool type to "round-robin" and size to "50". In general, I think this change ought to be backward compatible w.r.t. the existing behavior, which is to map a given ConnectionId to a unique Connection . To that end, we should default the pool type to "round-robin" and its size to 1. At any rate, considering that my patch is more generic, I vote for going forward with that, with these changes (see patch attached today): a) Change the default pool size to 1. b) Clean up all trailing white spaces. About Ryan's query, I had a hard time rewriting the YCSB workload in terms of our (homegrown) test scenario, which involved sending 5-15K HTable.get per second. The problem with the existing YCSB workload is that it assumes a scan, hence hard to refactor. At a minimum, I am going to run the "mvn test" suite and make sure no regressions were introduced, before I take another shot at YCSB, time permitting.
          Hide
          stack added a comment -

          I'd suggest you not try reworking YCSB. It will be hard to bring it around (Its hard code to work with). Let us know if your patch passes all tests and I'll give it a once-over. Thanks Karthik.

          Show
          stack added a comment - I'd suggest you not try reworking YCSB. It will be hard to bring it around (Its hard code to work with). Let us know if your patch passes all tests and I'll give it a once-over. Thanks Karthik.
          Hide
          Karthick Sankarachary added a comment -

          Please review the latest patch, which by default, exhibits the existing behavior (i.e., each connection id maps to precisely one connection).

          As far as testing is concerned, all but five tests passed. FWIW, those failures occur even without the patch, so in that sense, no regressions were found. For good measure, I added a couple of test cases in TestFromClientSide, which also serve to illustrate how the connection pool may be configured.

          Show
          Karthick Sankarachary added a comment - Please review the latest patch, which by default, exhibits the existing behavior (i.e., each connection id maps to precisely one connection). As far as testing is concerned, all but five tests passed. FWIW, those failures occur even without the patch, so in that sense, no regressions were found. For good measure, I added a couple of test cases in TestFromClientSide , which also serve to illustrate how the connection pool may be configured.
          Hide
          stack added a comment -

          I took a look at the patch. The changes to HBase seem innocuous enough. Looks like nice addition and would make it easy to try out different poolings.

          Whats a little odd is your adding of new functionality but w/o unit tests. It would help build confidence if you had a unit test to prove that the RoundRobinPool was indeed RoundRobin.... Seems like tests would be easy enough to do since these take generics.

          I see the Pool Interface. Seems to be straight subset of Map Interface? Do we need PoolMap then? Or should PoolMap be non-public?

          Should the pool implementations be inner classes of PoolMap because PoolMap refers to them explicitly in enum and in its little factory for creating them.

          Why does javadoc talk about SharedMap? Should that be PoolMap?

          Good stuff Karthik.

          Show
          stack added a comment - I took a look at the patch. The changes to HBase seem innocuous enough. Looks like nice addition and would make it easy to try out different poolings. Whats a little odd is your adding of new functionality but w/o unit tests. It would help build confidence if you had a unit test to prove that the RoundRobinPool was indeed RoundRobin.... Seems like tests would be easy enough to do since these take generics. I see the Pool Interface. Seems to be straight subset of Map Interface? Do we need PoolMap then? Or should PoolMap be non-public? Should the pool implementations be inner classes of PoolMap because PoolMap refers to them explicitly in enum and in its little factory for creating them. Why does javadoc talk about SharedMap? Should that be PoolMap? Good stuff Karthik.
          Hide
          Karthick Sankarachary added a comment -

          It would help build confidence if you had a unit test to prove that the RoundRobinPool was indeed RoundRobin

          Point well taken. I've added a suite of tests to cover all types of pool maps.

          I see the Pool Interface. Seems to be straight subset of Map Interface? Do we need PoolMap then? Or should PoolMap be non-public?

          The primary role of the PoolMap is to associate a pool of values with every key. The first time a key is inserted, it creates a pool of the specified type, and puts the value into the that pool. Subsequent inserts into the same key will put the value into the pre-existing pool. By the same token, when a key is removed, it's corresponding pool is cleared.

          The Pool interface, while it may seem like a subset of Map, was meant to be generic enough that it could represent not just a bounded list (a.k.a round-robin pool), or a bounded queue (a.k.a. reusable pool), but also a thread-local object (a.k.a. thread-local pool).

          Should the pool implementations be inner classes of PoolMap because PoolMap refers to them explicitly in enum and in its little factory for creating them.

          Agreed. I made the PoolMap a self-contained entity.

          {quote{Why does javadoc talk about SharedMap? Should that be PoolMap?


          Fixed.

          FYI, the version of the updated patch is V6 (sorry about the poor patch naming convention).

          Show
          Karthick Sankarachary added a comment - It would help build confidence if you had a unit test to prove that the RoundRobinPool was indeed RoundRobin Point well taken. I've added a suite of tests to cover all types of pool maps. I see the Pool Interface. Seems to be straight subset of Map Interface? Do we need PoolMap then? Or should PoolMap be non-public? The primary role of the PoolMap is to associate a pool of values with every key. The first time a key is inserted, it creates a pool of the specified type, and puts the value into the that pool. Subsequent inserts into the same key will put the value into the pre-existing pool. By the same token, when a key is removed, it's corresponding pool is cleared. The Pool interface, while it may seem like a subset of Map , was meant to be generic enough that it could represent not just a bounded list (a.k.a round-robin pool), or a bounded queue (a.k.a. reusable pool), but also a thread-local object (a.k.a. thread-local pool). Should the pool implementations be inner classes of PoolMap because PoolMap refers to them explicitly in enum and in its little factory for creating them. Agreed. I made the PoolMap a self-contained entity. {quote{Why does javadoc talk about SharedMap? Should that be PoolMap? Fixed. FYI, the version of the updated patch is V6 (sorry about the poor patch naming convention).
          Hide
          stack added a comment -

          Patch looks good. Let me run tests.

          Show
          stack added a comment - Patch looks good. Let me run tests.
          Hide
          stack added a comment -

          Applied to TRUNK. Thanks for the patch Karthik (S). I assigned you the issue. Tests passed for me.

          Show
          stack added a comment - Applied to TRUNK. Thanks for the patch Karthik (S). I assigned you the issue. Tests passed for me.
          Hide
          Karthick Sankarachary added a comment -

          Thanks much! Hopefully, this patch will address some of the concerns raised in HBASE-1849 as well. On a related note, the PoolMap may also be applied to generalize the HTablePool, as described in HBASE-2938 - would you like me to revisit/update that issue?

          Show
          Karthick Sankarachary added a comment - Thanks much! Hopefully, this patch will address some of the concerns raised in HBASE-1849 as well. On a related note, the PoolMap may also be applied to generalize the HTablePool , as described in HBASE-2938 - would you like me to revisit/update that issue?
          Hide
          stack added a comment -

          For sure go ahead and update hbase-2938 and note PoolMap in hbase-1849.

          Show
          stack added a comment - For sure go ahead and update hbase-2938 and note PoolMap in hbase-1849.
          Hide
          nicu marasoiu added a comment -

          Hi,

          I am not sure why multiplexing does not work so well with multiple tcp connections.
          When using one tcp connection, the multiplexing pays off peaking at 16 threads using same connection in my particular tests (remote client, 4kb batching of 2k puts).
          While when using 16 tcp connections, the multiplexing benefit peaks at 2 threads sharing one tcp connection.
          The relative benefit of sharing/multiplexing seems about the same when 2 htables share one socket. However when using hbase.client.ipc.pool.size=16, the benefit degrades rapidly when the multiplexing factor is above 2, and goes below the line when above 4.
          I would like to understand the underlying reason. Perhaps there are locking and contention mechanisms preventing us loading the multiple connections in the same way we can multiplex a single one.

          Here are my times with batched puts, n htable instances (threads), m connections in the robin pool:

          1 HTable: 1400 records in a fixed timeframe
          2 HTables sharing the tcp socket: 1566 records in a fixed timeframe
          8 Htables sharing the tcp socket: 6500 records
          16 HTables sharing the tcp socket: 9200 records
          32 HTables sharing the tcp socket: 6500 records

          256 Htables sharing the tcp socket: 1340 recs
          16 HTables on 16 connections: 16753 recs
          32 Htables on 16 connections: 18661 recs
          64 Htables on 16 connections: 16800 recs
          128 Htables on 16 connections: 4300 recs
          256 Htables on 16 connections: 2434 recs

          when saying 16 htables i mean 16 threads using a HTablePool

          You can see that it seems that the multiplexing performance degrades when using multiple connections much faster than when using just one (without pooling, default).

          Show
          nicu marasoiu added a comment - Hi, I am not sure why multiplexing does not work so well with multiple tcp connections. When using one tcp connection, the multiplexing pays off peaking at 16 threads using same connection in my particular tests (remote client, 4kb batching of 2k puts). While when using 16 tcp connections, the multiplexing benefit peaks at 2 threads sharing one tcp connection. The relative benefit of sharing/multiplexing seems about the same when 2 htables share one socket. However when using hbase.client.ipc.pool.size=16, the benefit degrades rapidly when the multiplexing factor is above 2, and goes below the line when above 4. I would like to understand the underlying reason. Perhaps there are locking and contention mechanisms preventing us loading the multiple connections in the same way we can multiplex a single one. Here are my times with batched puts, n htable instances (threads), m connections in the robin pool: 1 HTable: 1400 records in a fixed timeframe 2 HTables sharing the tcp socket: 1566 records in a fixed timeframe 8 Htables sharing the tcp socket: 6500 records 16 HTables sharing the tcp socket: 9200 records 32 HTables sharing the tcp socket: 6500 records 256 Htables sharing the tcp socket: 1340 recs 16 HTables on 16 connections: 16753 recs 32 Htables on 16 connections: 18661 recs 64 Htables on 16 connections: 16800 recs 128 Htables on 16 connections: 4300 recs 256 Htables on 16 connections: 2434 recs when saying 16 htables i mean 16 threads using a HTablePool You can see that it seems that the multiplexing performance degrades when using multiple connections much faster than when using just one (without pooling, default).
          Hide
          stack added a comment -

          Perhaps there are locking and contention mechanisms preventing us loading the multiple connections in the same way we can multiplex a single one.

          Can you see contention in thread dumps?

          Show
          stack added a comment - Perhaps there are locking and contention mechanisms preventing us loading the multiple connections in the same way we can multiplex a single one. Can you see contention in thread dumps?

            People

            • Assignee:
              Karthick Sankarachary
              Reporter:
              Karthick Sankarachary
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development