Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.92.0
    • Component/s: Coprocessors
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Adds new HTable.coprocessorProxy() and HTable.coprocessorExec() calls, along with server-side hooks for registering user RPC protocol handlers. Together these allow the invocation and execution of custom server-side code configured as coprocessors.

      Adds pluggable RPC engines as a first step towards alternate RPC implementations. This adds a field for the RPC protocol interface on connection setup, so it's not RPC wire compatible with previous versions.
      Show
      Adds new HTable.coprocessorProxy() and HTable.coprocessorExec() calls, along with server-side hooks for registering user RPC protocol handlers. Together these allow the invocation and execution of custom server-side code configured as coprocessors. Adds pluggable RPC engines as a first step towards alternate RPC implementations. This adds a field for the RPC protocol interface on connection setup, so it's not RPC wire compatible with previous versions.

      Description

      "High-level call interface for clients. Unlike RPC, calls addressed to rows or ranges of rows. Coprocessor client library resolves to actual locations. Calls across multiple rows automatically split into multiple parallelized RPCs"

      Generic multicall RPC facility which incorporates this and multiget/multiput/multidelete and parallel scanners.

      Group and batch RPCs by region server. Track and retry outstanding RPCs. Ride over region relocations.

      Support addressing by explicit region identifier or by row key or row key range.

      Include a facility for merging results client side.

        Issue Links

          Activity

          Hide
          hbasereviewboard HBase Review Board added a comment -

          Message from: "Andrew Purtell" <apurtell@apache.org>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/816/#review1423
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/client/Batch.java
          <http://review.cloudera.org/r/816/#comment4956>

          Could be setCause

          src/main/java/org/apache/hadoop/hbase/client/Exec.java
          <http://review.cloudera.org/r/816/#comment4957>

          Agree, subpackage would be a pain but Javadoc can make it explicit for sure.

          src/main/java/org/apache/hadoop/hbase/client/Exec.java
          <http://review.cloudera.org/r/816/#comment4955>

          Only one instance of a given Coprocessor class can be bound to a region at a time.

          src/main/java/org/apache/hadoop/hbase/client/ExecResult.java
          <http://review.cloudera.org/r/816/#comment4958>

          This is pretty generic but currently has no other use, correct.

          src/main/java/org/apache/hadoop/hbase/client/Scan.java
          <http://review.cloudera.org/r/816/#comment4959>

          We should be getting rid of RowRange, right?

          src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
          <http://review.cloudera.org/r/816/#comment4960>

          Good, likewise for exec() as below.

          • Andrew
          Show
          hbasereviewboard HBase Review Board added a comment - Message from: "Andrew Purtell" <apurtell@apache.org> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/816/#review1423 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/client/Batch.java < http://review.cloudera.org/r/816/#comment4956 > Could be setCause src/main/java/org/apache/hadoop/hbase/client/Exec.java < http://review.cloudera.org/r/816/#comment4957 > Agree, subpackage would be a pain but Javadoc can make it explicit for sure. src/main/java/org/apache/hadoop/hbase/client/Exec.java < http://review.cloudera.org/r/816/#comment4955 > Only one instance of a given Coprocessor class can be bound to a region at a time. src/main/java/org/apache/hadoop/hbase/client/ExecResult.java < http://review.cloudera.org/r/816/#comment4958 > This is pretty generic but currently has no other use, correct. src/main/java/org/apache/hadoop/hbase/client/Scan.java < http://review.cloudera.org/r/816/#comment4959 > We should be getting rid of RowRange, right? src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java < http://review.cloudera.org/r/816/#comment4960 > Good, likewise for exec() as below. Andrew
          Hide
          hbasereviewboard HBase Review Board added a comment -

          Message from: "Gary Helmling" <ghelmling@gmail.com>

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/Action.java, line 35

          > <http://review.cloudera.org/r/816/diff/9/?file=13679#file13679line35>

          >

          > Action is a new class so we are not breaking any pre-exisiting API here (Even so, erasure would reduce this API change to the old I believe anyways?).

          >

          Action and MultiAction seem to just be internal implementation classes, so I thought this would be a safe refactoring.

          You mean for something like a rolling restart? I believe the type erasure + ignoring return types on method lookup (getResult return type was parameterized so would be Object) should make this continue to work, but I may have introduced breakage elsewhere...

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 37

          > <http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line37>

          >

          > I think the fact that this class if of coprocessors needs to be highlighted better. Batch is a super generic name yet this Batch is only for CPs. A subpackage for these CP classes would be a pain would it? Any other way of grouping these CP classes? A prefix? Just throwing it out there (I'm sure you thought about it and had a reason for not going these routes).

          The name is I guess a remnant of these changes starting out as overly-generic, then scaled back to coprocessors specific implementation. I don't see any general applicability beyond that at the moment, so clearing up the naming would probably help. Don't want to get too wordy though... CoprocessorBatch?

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 50

          > <http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line50>

          >

          > cool

          >

          > But, this method's name is 'returning'?

          >

          > So, it executes the 'method' of 'protocol' and returns the hosting 'Call' whose invocation has already run?

          >

          > Should it be 'execute' or 'executeCall' or 'invokeCall', etc.

          This just returns a Batch.Call instance, whose call() method will invoke the specified CoprocessorProtocol method. So it returns a Batch.Call that returns the method result. At this point the remote invocation has not yet happened. That won't occur until down in HConnectionManager.HConnectionImplementation.processExecs(). Reached through passing the Batch.Call instance to HTable.exec(...).

          Yeah, "returning()" is a little generic too. I could rename to forMethod() or callingMethod()?

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 80

          > <http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line80>

          >

          > Do you need the 'Batch.' prefix here?

          No, it's extraneous and inconsistent with most HBase code. I'll drop it. I guess I was having a "static method invocations should be referenced by class name" moment...

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 132

          > <http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line132>

          >

          > Should this method be 'public' since its only called in here – whats returned out of a 'returning' is an exhausted call.. the receiving caller will not be doing a call invocation?

          Could make this default access I suppose. It's invoked down in HConnectionManager.HConnectionImplementation.processExecs(), which then will trigger the underlying CoprocessorProtocol method invocation and RPC call through ExecRPCInvoker.

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 140

          > <http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line140>

          >

          > I don't see Callback passed to call in the above. I suppose how Callback works will become clear later.

          Yes, Callback is passed to the second version of HTable.exec() for specific handling of Batch.Call.call() return values. The first version of HTable.exec() just uses a basic Callback that builds a Map with results.

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/Exec.java, line 65

          > <http://review.cloudera.org/r/816/diff/9/?file=13681#file13681line65>

          >

          > This is fair I suppose if only one coprocessor per region (Is that true)?

          Correct, only a single handler can be registered per CoprocessorProtocol subclass per region.

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/ExecResult.java, line 33

          > <http://review.cloudera.org/r/816/diff/9/?file=13682#file13682line33>

          >

          > This class is for CPs only?

          Yes, same as Exec above, could be renamed to CoprocessorExecResult for clarity.

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/HConnection.java, line 241

          > <http://review.cloudera.org/r/816/diff/9/?file=13683#file13683line241>

          >

          > Want to call this out as a CP method?

          >

          This is actually a refactoring of the previous processBatch() method to accommodate Exec instances as well.

          The processExecs() method below doesn't make use of it yet, but I'd like to incorporate that as an immediate next step for better RPC batching.

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/HConnection.java, line 252

          > <http://review.cloudera.org/r/816/diff/9/?file=13683#file13683line252>

          >

          > Cool

          >

          > So this would be for a single cooprocessor implementation.

          >

          > You say above that we do an rpc per row but are the rpcs run serially or in parallell? If parallel, thats sweet.

          They're parallelized using the existing HTable ExecutorService.

          But as an immediate next step, I would like to get them also batching into a single RPC call per CoprocessorProtocol method invocation, per region server. The scaffolding is already there in the (Multi)Action parameterization and HConnection.processBatchCallback(), I just need to coordinate the per RS batching through ExecRPCInvoker.

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 1036

          > <http://review.cloudera.org/r/816/diff/9/?file=13684#file13684line1036>

          >

          > Whats going on here? You are rigging the createCallable so it can be used by CPs?

          >

          > (no objection, just asking)

          Correct, this and the other (Multi)Action parameterization makes it possible to handle MultiAction<Exec> instances for full batching of the CP method calls per region server. It's just the ExecRPCInvoker coordination that isn't quite there yet.

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/HConnection.java, line 255

          > <http://review.cloudera.org/r/816/diff/9/?file=13683#file13683line255>

          >

          > So, here is a case where row designates a region, right? Not a 'row'. If all these CP classes were in a sub-package, you could do a package-info on CP w/ examples, etc. – copy/paste of the stuff you have above?

          Right, in this case the row key is just used to lookup the region location, so we can ultimately do getRegionServerWithRetries().

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 1087

          > <http://review.cloudera.org/r/816/diff/9/?file=13684#file13684line1087>

          >

          > Should this be public? Its not in HConnection, is it? Or, rather, why is it not in HConnection?

          Should be in HConnection as well.

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 1182

          > <http://review.cloudera.org/r/816/diff/9/?file=13684#file13684line1182>

          >

          > Again, this is rigging MultiAction to support the CP parameterized types?

          Correct, scaffolding for a better ExecRPCInvoker to be added.

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java, line 353

          > <http://review.cloudera.org/r/816/diff/9/?file=13686#file13686line353>

          >

          > oh, you don't have to repeat this doc up in HTable. I'd remove it and replace it '@Override'.

          Ah, ok, will remove the dupes.

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1340

          > <http://review.cloudera.org/r/816/diff/9/?file=13685#file13685line1340>

          >

          > What if I pass more than one key for a region? Will CP run twice?

          No, if both keys fall into the same region, then it should just use the start key.

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/RowRange.java, line 4

          > <http://review.cloudera.org/r/816/diff/9/?file=13689#file13689line4>

          >

          > You have to say something about 'inclusive'?

          I'll remove RowRange, since it was only used in a previous version of the HTable.exec() signatures. It's nicely parallel to Row, but with only a single implementation (Scan), seems useless.

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 112

          > <http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line112>

          >

          > good

          >

          > (but of interest, how does this differ from setCause? Or, could you pass the ite to the IOE constructor?

          Sure, will just pass the ITE as the cause.

          On 2010-10-04 23:09:49, stack wrote:

          > src/main/java/org/apache/hadoop/hbase/client/Exec.java, line 35

          > <http://review.cloudera.org/r/816/diff/9/?file=13681#file13681line35>

          >

          > ... against a Coprocessor?

          >

          > Maybe add above?

          >

          Yes, could rename CoprocessorExec for clarity. It's fairly generic but there's no other usage.

          I guess in the naming here and elsewhere, I was envisioning Coprocessors as the sort of stored procedures of HBase. A basic functionality – execute this user code – with coprocessors as the implementation. So I took a general approach to naming the client interface. It seemed to fit in with the basic operations: Get, Scan, Put, Delete, Exec.

          But if this is overly general and confusing, I have no problem renaming this and the other client bits with a "Coprocessor" prefix.

          Will definitely improve javadoc here as well.

          • Gary

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/816/#review1414
          -----------------------------------------------------------

          Show
          hbasereviewboard HBase Review Board added a comment - Message from: "Gary Helmling" <ghelmling@gmail.com> On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/Action.java, line 35 > < http://review.cloudera.org/r/816/diff/9/?file=13679#file13679line35 > > > Action is a new class so we are not breaking any pre-exisiting API here (Even so, erasure would reduce this API change to the old I believe anyways?). > Action and MultiAction seem to just be internal implementation classes, so I thought this would be a safe refactoring. You mean for something like a rolling restart? I believe the type erasure + ignoring return types on method lookup (getResult return type was parameterized so would be Object) should make this continue to work, but I may have introduced breakage elsewhere... On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 37 > < http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line37 > > > I think the fact that this class if of coprocessors needs to be highlighted better. Batch is a super generic name yet this Batch is only for CPs. A subpackage for these CP classes would be a pain would it? Any other way of grouping these CP classes? A prefix? Just throwing it out there (I'm sure you thought about it and had a reason for not going these routes). The name is I guess a remnant of these changes starting out as overly-generic, then scaled back to coprocessors specific implementation. I don't see any general applicability beyond that at the moment, so clearing up the naming would probably help. Don't want to get too wordy though... CoprocessorBatch? On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 50 > < http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line50 > > > cool > > But, this method's name is 'returning'? > > So, it executes the 'method' of 'protocol' and returns the hosting 'Call' whose invocation has already run? > > Should it be 'execute' or 'executeCall' or 'invokeCall', etc. This just returns a Batch.Call instance, whose call() method will invoke the specified CoprocessorProtocol method. So it returns a Batch.Call that returns the method result. At this point the remote invocation has not yet happened. That won't occur until down in HConnectionManager.HConnectionImplementation.processExecs(). Reached through passing the Batch.Call instance to HTable.exec(...). Yeah, "returning()" is a little generic too. I could rename to forMethod() or callingMethod()? On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 80 > < http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line80 > > > Do you need the 'Batch.' prefix here? No, it's extraneous and inconsistent with most HBase code. I'll drop it. I guess I was having a "static method invocations should be referenced by class name" moment... On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 132 > < http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line132 > > > Should this method be 'public' since its only called in here – whats returned out of a 'returning' is an exhausted call.. the receiving caller will not be doing a call invocation? Could make this default access I suppose. It's invoked down in HConnectionManager.HConnectionImplementation.processExecs(), which then will trigger the underlying CoprocessorProtocol method invocation and RPC call through ExecRPCInvoker. On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 140 > < http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line140 > > > I don't see Callback passed to call in the above. I suppose how Callback works will become clear later. Yes, Callback is passed to the second version of HTable.exec() for specific handling of Batch.Call.call() return values. The first version of HTable.exec() just uses a basic Callback that builds a Map with results. On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/Exec.java, line 65 > < http://review.cloudera.org/r/816/diff/9/?file=13681#file13681line65 > > > This is fair I suppose if only one coprocessor per region (Is that true)? Correct, only a single handler can be registered per CoprocessorProtocol subclass per region. On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/ExecResult.java, line 33 > < http://review.cloudera.org/r/816/diff/9/?file=13682#file13682line33 > > > This class is for CPs only? Yes, same as Exec above, could be renamed to CoprocessorExecResult for clarity. On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/HConnection.java, line 241 > < http://review.cloudera.org/r/816/diff/9/?file=13683#file13683line241 > > > Want to call this out as a CP method? > This is actually a refactoring of the previous processBatch() method to accommodate Exec instances as well. The processExecs() method below doesn't make use of it yet, but I'd like to incorporate that as an immediate next step for better RPC batching. On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/HConnection.java, line 252 > < http://review.cloudera.org/r/816/diff/9/?file=13683#file13683line252 > > > Cool > > So this would be for a single cooprocessor implementation. > > You say above that we do an rpc per row but are the rpcs run serially or in parallell? If parallel, thats sweet. They're parallelized using the existing HTable ExecutorService. But as an immediate next step, I would like to get them also batching into a single RPC call per CoprocessorProtocol method invocation, per region server. The scaffolding is already there in the (Multi)Action parameterization and HConnection.processBatchCallback(), I just need to coordinate the per RS batching through ExecRPCInvoker. On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 1036 > < http://review.cloudera.org/r/816/diff/9/?file=13684#file13684line1036 > > > Whats going on here? You are rigging the createCallable so it can be used by CPs? > > (no objection, just asking) Correct, this and the other (Multi)Action parameterization makes it possible to handle MultiAction<Exec> instances for full batching of the CP method calls per region server. It's just the ExecRPCInvoker coordination that isn't quite there yet. On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/HConnection.java, line 255 > < http://review.cloudera.org/r/816/diff/9/?file=13683#file13683line255 > > > So, here is a case where row designates a region, right? Not a 'row'. If all these CP classes were in a sub-package, you could do a package-info on CP w/ examples, etc. – copy/paste of the stuff you have above? Right, in this case the row key is just used to lookup the region location, so we can ultimately do getRegionServerWithRetries(). On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 1087 > < http://review.cloudera.org/r/816/diff/9/?file=13684#file13684line1087 > > > Should this be public? Its not in HConnection, is it? Or, rather, why is it not in HConnection? Should be in HConnection as well. On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 1182 > < http://review.cloudera.org/r/816/diff/9/?file=13684#file13684line1182 > > > Again, this is rigging MultiAction to support the CP parameterized types? Correct, scaffolding for a better ExecRPCInvoker to be added. On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java, line 353 > < http://review.cloudera.org/r/816/diff/9/?file=13686#file13686line353 > > > oh, you don't have to repeat this doc up in HTable. I'd remove it and replace it '@Override'. Ah, ok, will remove the dupes. On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1340 > < http://review.cloudera.org/r/816/diff/9/?file=13685#file13685line1340 > > > What if I pass more than one key for a region? Will CP run twice? No, if both keys fall into the same region, then it should just use the start key. On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/RowRange.java, line 4 > < http://review.cloudera.org/r/816/diff/9/?file=13689#file13689line4 > > > You have to say something about 'inclusive'? I'll remove RowRange, since it was only used in a previous version of the HTable.exec() signatures. It's nicely parallel to Row, but with only a single implementation (Scan), seems useless. On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/Batch.java, line 112 > < http://review.cloudera.org/r/816/diff/9/?file=13680#file13680line112 > > > good > > (but of interest, how does this differ from setCause? Or, could you pass the ite to the IOE constructor? Sure, will just pass the ITE as the cause. On 2010-10-04 23:09:49, stack wrote: > src/main/java/org/apache/hadoop/hbase/client/Exec.java, line 35 > < http://review.cloudera.org/r/816/diff/9/?file=13681#file13681line35 > > > ... against a Coprocessor? > > Maybe add above? > Yes, could rename CoprocessorExec for clarity. It's fairly generic but there's no other usage. I guess in the naming here and elsewhere, I was envisioning Coprocessors as the sort of stored procedures of HBase. A basic functionality – execute this user code – with coprocessors as the implementation. So I took a general approach to naming the client interface. It seemed to fit in with the basic operations: Get, Scan, Put, Delete, Exec. But if this is overly general and confusing, I have no problem renaming this and the other client bits with a "Coprocessor" prefix. Will definitely improve javadoc here as well. Gary ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/816/#review1414 -----------------------------------------------------------
          Hide
          hbasereviewboard HBase Review Board added a comment -

          Message from: "Gary Helmling" <ghelmling@gmail.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/816/
          -----------------------------------------------------------

          (Updated 2010-10-07 12:27:43.994582)

          Review request for hbase, stack, Andrew Purtell, and Jonathan Gray.

          Changes
          -------

          Remove unnecessary List to array conversion and copy in refactored HConnectionMananger.HConnectionImplementation.processBatch() and fix retry-mishandling in refactored code.

          Summary
          -------

          This is really two separate patches in one, though with some overlapping changes. If necessary I can split them apart for separate review. Please let me know if that would make review easier.

          Part 1:
          ==============
          Port over of HADOOP-6422 to the HBase RPC code. The goal of this change is to allow alternate RPC client/server implementations to be enabled through a simple configuration change. Ultimately I would like to use this to allow secure RPC to be enabled through configuration, while not blocking normal (current) RPC operation on non-secure Hadoop versions.

          This portion of the patch abstracts out two interfaces from the RPC code:

          RpcEngine: HBaseRPC uses this to obtain proxy instances for client calls and server instances for HMaster and HRegionServer
          RpcServer: this allows differing RPC server implementations, breaking the dependency on HBaseServer

          The bulk of the current code from HBaseRPC is moved into WritableRpcEngine and is unchanged other than the interface requirements. So the current call path remains the same, other than the HBaseRPC.getProtocolEngine() abstraction.

          Part 2:
          ===============
          The remaining changes provide server-side hooks for registering new RPC protocols/handlers (per-region to support coprocessors), and client side hooks to support dynamic execution of the registered protocols.

          The new RPC protocol actions are constrained to org.apache.hadoop.hbase.ipc.CoprocessorProtocol implementations (which extends VersionedProtocol) to prevent arbitrary execution of methods against HMasterInterface, HRegionInterface, etc.

          For protocol handler registration, HRegionServer provides a new method:

          public <T extends CoprocessorProtocol> boolean registerProtocol(
          byte[] region, Class<T> protocol, T handler)

          which builds a Map of region name to protocol instances for dispatching client calls.

          Client invocations are performed through HTable, which adds the following methods:

          public <T extends CoprocessorProtocol> T proxy(Class<T> protocol, Row row)

          This directly returns a proxy instance to the CoprocessorProtocol implementation registered for the region serving row "row". Any method calls will be proxied to the region's server and invoked using the map of registered region name -> handler instances.

          Calls directed against multiple rows are a bit more complicated. They are supported with the methods:

          public <T extends CoprocessorProtocol, R> void exec(
          Class<T> protocol, List<? extends Row> rows,
          BatchCall<T,R> callable, BatchCallback<R> callback)

          public <T extends CoprocessorProtocol, R> void exec(
          Class<T> protocol, RowRange range,
          BatchCall<T,R> callable, BatchCallback<R> callback)

          where BatchCall and BatchCallback are simple interfaces defining the methods to be called and a callback instance to be invoked for each result.

          For the sample CoprocessorProtocol interface:

          interface PingProtocol extends CoprocessorProtocol

          { public String ping(); public String hello(String name); }

          a client invocation might look like:

          final Map<byte[],R> results = new TreeMap<byte[],R>(...)
          List<Row> rows = ...
          table.exec(PingProtocol.class, rows,
          new HTable.BatchCall<PingProtocol,String>() {
          public String call(PingProtocol instance)

          { return instance.ping(); }

          },
          new BatchCallback<R>(){
          public void update(byte[] region, byte[] row, R value)

          { results.put(region, value); }

          });

          The BatchCall.call() method will be invoked for each row in the passed in list, and the BatchCallback.update() method will be invoked for each return value. However, currently the PingProtocol.ping() invocation will result in a separate RPC call per row, which is less that ideal.

          Support is in place to make use of the HRegionServer.multi() invocations for batched RPC (see the org.apache.hadoop.hbase.client.Exec class), but this does not mesh well with the current client-side interface.

          In addition to standard code review, I'd appreciate any thoughts on the client interactions in particular, and whether they would meet some of the anticipated uses of coprocessors.

          This addresses bug HBASE-2002.
          http://issues.apache.org/jira/browse/HBASE-2002

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/client/Action.java 556ea81
          src/main/java/org/apache/hadoop/hbase/client/HConnection.java 65f7618
          src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 4ad91c6
          src/main/java/org/apache/hadoop/hbase/client/HTable.java 0dbf263
          src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 74593bf
          src/main/java/org/apache/hadoop/hbase/client/MultiAction.java c6ea838
          src/main/java/org/apache/hadoop/hbase/client/MultiResponse.java 91bd04b
          src/main/java/org/apache/hadoop/hbase/client/coprocessor/Batch.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/client/coprocessor/Exec.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/client/coprocessor/ExecResult.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/client/coprocessor/package-info.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 83f623d
          src/main/java/org/apache/hadoop/hbase/ipc/ConnectionHeader.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/ipc/CoprocessorProtocol.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/ipc/ExecRPCInvoker.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java 2b5eeb6
          src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java e23a629
          src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java e4c356d
          src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java ee5dd8f
          src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/ipc/RpcEngine.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 6b800e6
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 5f829e4
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 36c404d
          src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java d4166cf
          src/main/resources/hbase-default.xml 5fafe65
          src/test/java/org/apache/hadoop/hbase/regionserver/TestServerCustomProtocol.java PRE-CREATION

          Diff: http://review.cloudera.org/r/816/diff

          Testing
          -------

          Thanks,

          Gary

          Show
          hbasereviewboard HBase Review Board added a comment - Message from: "Gary Helmling" <ghelmling@gmail.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/816/ ----------------------------------------------------------- (Updated 2010-10-07 12:27:43.994582) Review request for hbase, stack, Andrew Purtell, and Jonathan Gray. Changes ------- Remove unnecessary List to array conversion and copy in refactored HConnectionMananger.HConnectionImplementation.processBatch() and fix retry-mishandling in refactored code. Summary ------- This is really two separate patches in one, though with some overlapping changes. If necessary I can split them apart for separate review. Please let me know if that would make review easier. Part 1: ============== Port over of HADOOP-6422 to the HBase RPC code. The goal of this change is to allow alternate RPC client/server implementations to be enabled through a simple configuration change. Ultimately I would like to use this to allow secure RPC to be enabled through configuration, while not blocking normal (current) RPC operation on non-secure Hadoop versions. This portion of the patch abstracts out two interfaces from the RPC code: RpcEngine: HBaseRPC uses this to obtain proxy instances for client calls and server instances for HMaster and HRegionServer RpcServer: this allows differing RPC server implementations, breaking the dependency on HBaseServer The bulk of the current code from HBaseRPC is moved into WritableRpcEngine and is unchanged other than the interface requirements. So the current call path remains the same, other than the HBaseRPC.getProtocolEngine() abstraction. Part 2: =============== The remaining changes provide server-side hooks for registering new RPC protocols/handlers (per-region to support coprocessors), and client side hooks to support dynamic execution of the registered protocols. The new RPC protocol actions are constrained to org.apache.hadoop.hbase.ipc.CoprocessorProtocol implementations (which extends VersionedProtocol) to prevent arbitrary execution of methods against HMasterInterface, HRegionInterface, etc. For protocol handler registration, HRegionServer provides a new method: public <T extends CoprocessorProtocol> boolean registerProtocol( byte[] region, Class<T> protocol, T handler) which builds a Map of region name to protocol instances for dispatching client calls. Client invocations are performed through HTable, which adds the following methods: public <T extends CoprocessorProtocol> T proxy(Class<T> protocol, Row row) This directly returns a proxy instance to the CoprocessorProtocol implementation registered for the region serving row "row". Any method calls will be proxied to the region's server and invoked using the map of registered region name -> handler instances. Calls directed against multiple rows are a bit more complicated. They are supported with the methods: public <T extends CoprocessorProtocol, R> void exec( Class<T> protocol, List<? extends Row> rows, BatchCall<T,R> callable, BatchCallback<R> callback) public <T extends CoprocessorProtocol, R> void exec( Class<T> protocol, RowRange range, BatchCall<T,R> callable, BatchCallback<R> callback) where BatchCall and BatchCallback are simple interfaces defining the methods to be called and a callback instance to be invoked for each result. For the sample CoprocessorProtocol interface: interface PingProtocol extends CoprocessorProtocol { public String ping(); public String hello(String name); } a client invocation might look like: final Map<byte[],R> results = new TreeMap<byte[],R>(...) List<Row> rows = ... table.exec(PingProtocol.class, rows, new HTable.BatchCall<PingProtocol,String>() { public String call(PingProtocol instance) { return instance.ping(); } }, new BatchCallback<R>(){ public void update(byte[] region, byte[] row, R value) { results.put(region, value); } }); The BatchCall.call() method will be invoked for each row in the passed in list, and the BatchCallback.update() method will be invoked for each return value. However, currently the PingProtocol.ping() invocation will result in a separate RPC call per row, which is less that ideal. Support is in place to make use of the HRegionServer.multi() invocations for batched RPC (see the org.apache.hadoop.hbase.client.Exec class), but this does not mesh well with the current client-side interface. In addition to standard code review, I'd appreciate any thoughts on the client interactions in particular, and whether they would meet some of the anticipated uses of coprocessors. This addresses bug HBASE-2002 . http://issues.apache.org/jira/browse/HBASE-2002 Diffs (updated) src/main/java/org/apache/hadoop/hbase/client/Action.java 556ea81 src/main/java/org/apache/hadoop/hbase/client/HConnection.java 65f7618 src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 4ad91c6 src/main/java/org/apache/hadoop/hbase/client/HTable.java 0dbf263 src/main/java/org/apache/hadoop/hbase/client/HTableInterface.java 74593bf src/main/java/org/apache/hadoop/hbase/client/MultiAction.java c6ea838 src/main/java/org/apache/hadoop/hbase/client/MultiResponse.java 91bd04b src/main/java/org/apache/hadoop/hbase/client/coprocessor/Batch.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/client/coprocessor/Exec.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/client/coprocessor/ExecResult.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/client/coprocessor/package-info.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 83f623d src/main/java/org/apache/hadoop/hbase/ipc/ConnectionHeader.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/ipc/CoprocessorProtocol.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/ipc/ExecRPCInvoker.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java 2b5eeb6 src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java e23a629 src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java e4c356d src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java ee5dd8f src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/ipc/RpcEngine.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/HMaster.java 6b800e6 src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 5f829e4 src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 36c404d src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java d4166cf src/main/resources/hbase-default.xml 5fafe65 src/test/java/org/apache/hadoop/hbase/regionserver/TestServerCustomProtocol.java PRE-CREATION Diff: http://review.cloudera.org/r/816/diff Testing ------- Thanks, Gary
          Hide
          stack stack added a comment -

          Moving into 0.92 as per discussion up on dev list (Correct me if I'm making wrong move here lads).

          Show
          stack stack added a comment - Moving into 0.92 as per discussion up on dev list (Correct me if I'm making wrong move here lads).
          Hide
          hbasereviewboard HBase Review Board added a comment -

          Message from: stack@duboce.net

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/816/#review1933
          -----------------------------------------------------------

          Ship it!

          I did a quick pass over this. Most I'd seen already over in the Minjgie patch. I'm +1 getting it into TRUNK now early in the release cycle so probs. surface before release (You going to commit Andrew?).

          • stack
          Show
          hbasereviewboard HBase Review Board added a comment - Message from: stack@duboce.net ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/816/#review1933 ----------------------------------------------------------- Ship it! I did a quick pass over this. Most I'd seen already over in the Minjgie patch. I'm +1 getting it into TRUNK now early in the release cycle so probs. surface before release (You going to commit Andrew?). stack
          Hide
          ghelmling Gary Helmling added a comment -

          Patch for commit from the HBASE-2321 & HBASE-2002 patch on review board.

          The only changes from the review board patch are a merge up to the latest trunk for a clean diff, some whitespace and line wrapping fixes, and bumping the RPC version number to 28, since the connection header now contains the protocol interface name.

          Show
          ghelmling Gary Helmling added a comment - Patch for commit from the HBASE-2321 & HBASE-2002 patch on review board. The only changes from the review board patch are a merge up to the latest trunk for a clean diff, some whitespace and line wrapping fixes, and bumping the RPC version number to 28, since the connection header now contains the protocol interface name.
          Hide
          ghelmling Gary Helmling added a comment -

          Updated version of review board patch now attached

          Show
          ghelmling Gary Helmling added a comment - Updated version of review board patch now attached
          Hide
          apurtell Andrew Purtell added a comment -

          Committed to trunk. All tests pass locally.

          Show
          apurtell Andrew Purtell added a comment - Committed to trunk. All tests pass locally.
          Hide
          lars_francke Lars Francke added a comment -

          This issue was closed as part of a bulk closing operation on 2015-11-20. All issues that have been resolved and where all fixVersions have been released have been closed (following discussions on the mailing list).

          Show
          lars_francke Lars Francke added a comment - This issue was closed as part of a bulk closing operation on 2015-11-20. All issues that have been resolved and where all fixVersions have been released have been closed (following discussions on the mailing list).

            People

            • Assignee:
              ghelmling Gary Helmling
              Reporter:
              apurtell Andrew Purtell
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development