Details

    • Type: Brainstorming Brainstorming
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Is it just me or do others sense that there is pressure building to redo the client? If just me, ignore the below... I'll just keep notes in here. Otherwise, what would the requirements for a client rewrite look like?

      + Let out InterruptedException
      + Enveloping of messages or space for metadata that can be passed by client to server and by server to client; e.g. the region a.b.c moved to server x.y.z. or scanner is finished or timeout
      + A different RPC? One with tighter serialization.
      + More sane timeout/retry policy.

      Does it have to support async communication? Do callbacks?

      What else?

        Issue Links

          Activity

          stack made changes -
          Link This issue is related to HBASE-2182 [ HBASE-2182 ]
          stack made changes -
          Link This issue relates to HBASE-4956 [ HBASE-4956 ]
          Hide
          stack added a comment -

          Breathing some life back into this issue:

          Reasons for new client, updates:

          + The Jonathan Payne accounting of unaccounted off-heap socket buffers per thread which makes our client OOME when lots of threads (HBASE-4956)
          + "complex", lots of layers, long-lived zk connection (not necessary on client?)
          + Should work against multiple versions of hbase (but that might be another issue, an rpc issue... this issue could be distinct from rpc fixup?)

          See also Lars comment here: https://issues.apache.org/jira/browse/HBASE-5058?focusedCommentId=13173364&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13173364

          Show
          stack added a comment - Breathing some life back into this issue: Reasons for new client, updates: + The Jonathan Payne accounting of unaccounted off-heap socket buffers per thread which makes our client OOME when lots of threads ( HBASE-4956 ) + "complex", lots of layers, long-lived zk connection (not necessary on client?) + Should work against multiple versions of hbase (but that might be another issue, an rpc issue... this issue could be distinct from rpc fixup?) See also Lars comment here: https://issues.apache.org/jira/browse/HBASE-5058?focusedCommentId=13173364&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13173364
          stack made changes -
          Link This issue relates to HBASE-2170 [ HBASE-2170 ]
          stack made changes -
          Link This issue relates to HBASE-1843 [ HBASE-1843 ]
          stack made changes -
          Link This issue relates to HBASE-1844 [ HBASE-1844 ]
          stack made changes -
          Link This issue relates to HBASE-3382 [ HBASE-3382 ]
          stack made changes -
          Link This issue relates to HBASE-2937 [ HBASE-2937 ]
          stack made changes -
          Link This issue incorporates HBASE-3584 [ HBASE-3584 ]
          stack made changes -
          Summary Rewrite our client Rewrite our client (client 2.0)
          stack made changes -
          Field Original Value New Value
          Link This issue incorporates HBASE-2408 [ HBASE-2408 ]
          Hide
          Andrew Purtell added a comment -

          wouldn't it be better not to need thread pools?

          I'm not opposed to the idea. Not sure how far people want to go.

          Show
          Andrew Purtell added a comment - wouldn't it be better not to need thread pools? I'm not opposed to the idea. Not sure how far people want to go.
          Hide
          ryan rawson added a comment -

          wouldn't it be better not to need thread pools? Right now we are using them to merely wait on sync APIs that underneath async/multiplexing. A big waste of threads to me...!

          Show
          ryan rawson added a comment - wouldn't it be better not to need thread pools? Right now we are using them to merely wait on sync APIs that underneath async/multiplexing. A big waste of threads to me...!
          Hide
          Andrew Purtell added a comment -

          Are not executors thread pools?

          I meant merely no creation of internal thread pools. So users can create executors with their own thread factories, etc.

          Show
          Andrew Purtell added a comment - Are not executors thread pools? I meant merely no creation of internal thread pools. So users can create executors with their own thread factories, etc.
          Hide
          ryan rawson added a comment -

          Are not executors thread pools? Right now HCM will use the executor passed to it from HTable to do parallel queries (multi). But there is no good reason to layer more threads on top of the socket/proxy layer.

          Show
          ryan rawson added a comment - Are not executors thread pools? Right now HCM will use the executor passed to it from HTable to do parallel queries (multi). But there is no good reason to layer more threads on top of the socket/proxy layer.
          Hide
          Andrew Purtell added a comment -
          • Run time interface extension, consider the existing dynamic RPC stuff though it doesn't have to be done exactly that way. Regarding use (or not) of proxy objects, right now we have <T> HTable.coprocessorProxy(...) which if users adopt will produce user code that does need it.
          • Option for async interaction.
          • Use executors instead of threads. Allow the user to pass in an executor pool of desired construction.
          Show
          Andrew Purtell added a comment - Run time interface extension, consider the existing dynamic RPC stuff though it doesn't have to be done exactly that way. Regarding use (or not) of proxy objects, right now we have <T> HTable.coprocessorProxy(...) which if users adopt will produce user code that does need it. Option for async interaction. Use executors instead of threads. Allow the user to pass in an executor pool of desired construction. Resurrect HBASE-1015 ?
          Hide
          Jonathan Gray added a comment -

          A binary, language agnostic underlying RPC and wire protocol. Async, as an option, would be nice as well.

          I'd like more visibility and control into what is happening underneath with respect to connections to RegionServers and such. I don't like all the staticness and voodoo magic, at least not as the only option. The usage of like a hash of Configuration has always been weird to me.

          A better API for how errors are returned, for example, I can never understand how the MultiAction stuff without digging into code.

          +1 to your suggestions. We can already do stuff off the back of ZK for region movement if we wanted, but the opportunity for little hints in RPCs would be neat as well.

          Thanks for filing this stack.

          Show
          Jonathan Gray added a comment - A binary, language agnostic underlying RPC and wire protocol. Async, as an option, would be nice as well. I'd like more visibility and control into what is happening underneath with respect to connections to RegionServers and such. I don't like all the staticness and voodoo magic, at least not as the only option. The usage of like a hash of Configuration has always been weird to me. A better API for how errors are returned, for example, I can never understand how the MultiAction stuff without digging into code. +1 to your suggestions. We can already do stuff off the back of ZK for region movement if we wanted, but the opportunity for little hints in RPCs would be neat as well. Thanks for filing this stack.
          Hide
          ryan rawson added a comment -

          Things that are issues:

          • the use of proxy means that the interfaces must have InterruptedException on the interface, or else you get "undeclared throwable exception", but now you are conflating a business contract (the interfaces) and networking/execution realities. Futhermore going through a proxy object isn't necessary, its just more layers, since few people directly code against the interfaces.
          • multiple level of timeouts causes unnecessary confusion. Also the retry loops in HCM cause confusion and issues.
          • client should support parallelism more directly, no more thread pools that just sleep!
          • lots of callables make the code harder to read, either get rid of them or use more inner classes. Jumping around files makes for difficult comprehension.

          Some good things:

          • the base socket handling is actually in good shape. 1 socket per client-rs pair is about where we want to be.
          • multiplexing requests on the same socket is good, not spawning extra threads server side just to handle more clients is also good. since every client will have an open socket to at least the META region, this is very important!
          • the handler pool is a natural side effect of the previous point, unbounding it might not be a good idea.

          Other constraints:

          • we will want to provide an efficient blocking API, it's what is expected.
          • an async api might be nice, perhaps it can layer on or something.
          • Making HTable thread agnostic might be useful. Pooling the write buffer or doing something else interesting there would be necessary.
          Show
          ryan rawson added a comment - Things that are issues: the use of proxy means that the interfaces must have InterruptedException on the interface, or else you get "undeclared throwable exception", but now you are conflating a business contract (the interfaces) and networking/execution realities. Futhermore going through a proxy object isn't necessary, its just more layers, since few people directly code against the interfaces. multiple level of timeouts causes unnecessary confusion. Also the retry loops in HCM cause confusion and issues. client should support parallelism more directly, no more thread pools that just sleep! lots of callables make the code harder to read, either get rid of them or use more inner classes. Jumping around files makes for difficult comprehension. Some good things: the base socket handling is actually in good shape. 1 socket per client-rs pair is about where we want to be. multiplexing requests on the same socket is good, not spawning extra threads server side just to handle more clients is also good. since every client will have an open socket to at least the META region, this is very important! the handler pool is a natural side effect of the previous point, unbounding it might not be a good idea. Other constraints: we will want to provide an efficient blocking API, it's what is expected. an async api might be nice, perhaps it can layer on or something. Making HTable thread agnostic might be useful. Pooling the write buffer or doing something else interesting there would be necessary.
          stack created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              stack
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:

                Development