HBase
  1. HBase
  2. HBASE-1845

MultiGet, MultiDelete, and MultiPut - batched to the appropriate region servers

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.0
    • Component/s: None
    • Labels:
      None

      Description

      I've started to create a general interface for doing these batch/multi calls and would like to get some input and thoughts about how we should handle this and what the protocol should
      look like.

      First naive patch, coming soon.

      1. multi-v1.patch
        31 kB
        Erik Holstad
      2. batch.patch
        36 kB
        Erik Holstad
      3. hbase-1845_0.20.3.patch
        37 kB
        Marc Limotte
      4. hbase-1845_0.20.5.patch
        64 kB
        Marc Limotte
      5. hbase-1845-trunk.patch
        67 kB
        Marc Limotte

        Issue Links

          Activity

          Hide
          Erik Holstad added a comment -

          Added a first draft of something that is a way to deal with batching of all types of actions, Deletes, Gets and Puts.

          The patch includes a couple of different classes to deal with sending and receiving theses multi calls. The calls to the different Region servers
          are threaded on the client, but the instantiation is happening in the call at the moment, will be moved so we don't have to do it every time.
          The return format from all the calls multi(List<Delete>), multi(List<Get>) and multi(List<Put>) all return the same type Result [] of the same length
          as the request to the specific RS. For all elements where the Result[i] == null we have to retry the action, from the client. This common return type
          adds a little bit of overhead for returns of Puts and Deletes but is needed to keep it simple and generic.

          Changed the Row interface a little to be WritableComparable<Row>, needed for the new methods.

          I have only done sanity check tests so far and are going to add more test to cover corner cases, but just wanted
          to get the first draft out there to get feedback.

          Show
          Erik Holstad added a comment - Added a first draft of something that is a way to deal with batching of all types of actions, Deletes, Gets and Puts. The patch includes a couple of different classes to deal with sending and receiving theses multi calls. The calls to the different Region servers are threaded on the client, but the instantiation is happening in the call at the moment, will be moved so we don't have to do it every time. The return format from all the calls multi(List<Delete>), multi(List<Get>) and multi(List<Put>) all return the same type Result [] of the same length as the request to the specific RS. For all elements where the Result [i] == null we have to retry the action, from the client. This common return type adds a little bit of overhead for returns of Puts and Deletes but is needed to keep it simple and generic. Changed the Row interface a little to be WritableComparable<Row>, needed for the new methods. I have only done sanity check tests so far and are going to add more test to cover corner cases, but just wanted to get the first draft out there to get feedback.
          Hide
          Erik Holstad added a comment -

          Fixed some issues after comments from Jonathan, also moved most of the code into the HConnectionManager to not pollute
          HTable too much. Renamed the call from multi to batch and fixed some bugs related to the grouping into HRS maps.

          Show
          Erik Holstad added a comment - Fixed some issues after comments from Jonathan, also moved most of the code into the HConnectionManager to not pollute HTable too much. Renamed the call from multi to batch and fixed some bugs related to the grouping into HRS maps.
          Hide
          Erik Holstad added a comment -

          Small changes

          Show
          Erik Holstad added a comment - Small changes
          Hide
          stack added a comment -

          I think this patch looks great Holstad.

          Here are some initial comments.

          Deprecate the old means of batching in favor of your new method? You can have the old methods call into your new method?

          The check a list is sorted probably takes as long as actual sort?

          If only one item in list, avoid sort? Its an optimization we have elsehwere.

          Multi should be renamed. Name it Rows? MultiResults should be renamed Results? Says in class comment that Multi is about Gets but its more general than this?

          Row has to be a Writable? Is it OK making it WritableComparable? I was going to make Row implement Comparable but thought that it'd narrow our ability to mix it in? If Row is WritableComparable, then should change how Get, Put, etc. implement removing the Comparable and Writable and just use Row?

          Can we batch Puts, Deletes, and Gets? Or does the batch have to be pure – all of one type?

          Whats the threading story? HTable has a pool of executors used for batching. So, this would be a continuation of HTable not being thread-safe. It looks like HCM should be fine if used by many threads?

          Testing of failure mid-put with verification that we recover properly would be nice, especially if could be done in a unit test?

          Show
          stack added a comment - I think this patch looks great Holstad. Here are some initial comments. Deprecate the old means of batching in favor of your new method? You can have the old methods call into your new method? The check a list is sorted probably takes as long as actual sort? If only one item in list, avoid sort? Its an optimization we have elsehwere. Multi should be renamed. Name it Rows? MultiResults should be renamed Results? Says in class comment that Multi is about Gets but its more general than this? Row has to be a Writable? Is it OK making it WritableComparable? I was going to make Row implement Comparable but thought that it'd narrow our ability to mix it in? If Row is WritableComparable, then should change how Get, Put, etc. implement removing the Comparable and Writable and just use Row? Can we batch Puts, Deletes, and Gets? Or does the batch have to be pure – all of one type? Whats the threading story? HTable has a pool of executors used for batching. So, this would be a continuation of HTable not being thread-safe. It looks like HCM should be fine if used by many threads? Testing of failure mid-put with verification that we recover properly would be nice, especially if could be done in a unit test?
          Hide
          ryan rawson added a comment -

          when we batch, the old code used to batch by region, which is not the most performant, instead we should batch by regionserver. the recovery might be a little more complex, since did we fail due to split or due to regions moving?

          Show
          ryan rawson added a comment - when we batch, the old code used to batch by region, which is not the most performant, instead we should batch by regionserver. the recovery might be a little more complex, since did we fail due to split or due to regions moving?
          Hide
          Jonathan Gray added a comment -

          @ryan this groups by regionserver, and sends out to each in threads. it definitely adds complexity to recovery, but is solvable and at least partially solved in the current patch, though not well tested yet.

          Show
          Jonathan Gray added a comment - @ryan this groups by regionserver, and sends out to each in threads. it definitely adds complexity to recovery, but is solvable and at least partially solved in the current patch, though not well tested yet.
          Hide
          ryan rawson added a comment -

          great, i havent been looking at patches yet, too busy. much better approach, i look forward to seeing it speed up some of my jobs.

          Show
          ryan rawson added a comment - great, i havent been looking at patches yet, too busy. much better approach, i look forward to seeing it speed up some of my jobs.
          Hide
          Erik Holstad added a comment -

          @Stack

          Deprecating sounds good to me, will have the old calls use this new one behind the scenes if we decide to use this.

          I would say that checking if the list is sorted will probably go pretty fast, usually, since it breaks at the first occurrence of an unsorted element, but I'm open to remove it, would like to do
          some timing test to see if it is worth it or not before making the decision.

          Will add the check for one element

          Like the name Results, but not really liking Rows but it is not a big deal to me either. Think that batch is kinda ok.

          The reason that Row needs to be writable is the same as for Filter. Will make it WritableComparable, so that will lead to Gets, Puts and Deletes just implement Row. Not sure that this is good though, to hide those inside Row?

          As the code is written now, you can mix and match all of the types into one batch, but I need to do more testing on this and we might need to extend the compare code to include something more than row.

          Yeah, I guess that it what that means, Not really sure what would be the best alternative here and if we are looking to make HTable thread safe?

          I have done some test for recovery, that are not in the patch, but those required modifying the actual code, but will try to figure something out so we can put it in a unit test.

          Show
          Erik Holstad added a comment - @Stack Deprecating sounds good to me, will have the old calls use this new one behind the scenes if we decide to use this. I would say that checking if the list is sorted will probably go pretty fast, usually, since it breaks at the first occurrence of an unsorted element, but I'm open to remove it, would like to do some timing test to see if it is worth it or not before making the decision. Will add the check for one element Like the name Results, but not really liking Rows but it is not a big deal to me either. Think that batch is kinda ok. The reason that Row needs to be writable is the same as for Filter. Will make it WritableComparable, so that will lead to Gets, Puts and Deletes just implement Row. Not sure that this is good though, to hide those inside Row? As the code is written now, you can mix and match all of the types into one batch, but I need to do more testing on this and we might need to extend the compare code to include something more than row. Yeah, I guess that it what that means, Not really sure what would be the best alternative here and if we are looking to make HTable thread safe? I have done some test for recovery, that are not in the patch, but those required modifying the actual code, but will try to figure something out so we can put it in a unit test.
          Hide
          Erik Holstad added a comment -

          Will try to move the sorting into the server instead of the client so it will be done on smaller lists and in parallel on the individual servers.

          Show
          Erik Holstad added a comment - Will try to move the sorting into the server instead of the client so it will be done on smaller lists and in parallel on the individual servers.
          Hide
          stack added a comment -

          HTable is not thread-safe, for updates anyways. Lets go w/ that. Just make sure that changes down in HCM are.

          Would be sweet if didn't have to do all testing spinning up a minicluster.

          Show
          stack added a comment - HTable is not thread-safe, for updates anyways. Lets go w/ that. Just make sure that changes down in HCM are. Would be sweet if didn't have to do all testing spinning up a minicluster.
          Hide
          Andrew Purtell added a comment -

          I assume that write buffering of single operations will wrap the new multi calls. Currently there is a wart with respect to the write buffer.

          From a Trend dev team:

          When insert rows into one table by calling the method public synchronized void put(final Put put), if the column family of one row does not exist, the insert operation will failed and throw NoSuchColumnFamilyException.. We observed that all the following insert operation will fails even though all of them have valid column family. That is one exception of insert operation can cause failure of all the following insert operation.

          Their further analysis explains in detail the scenario, which I will summarize here:

          1) An invalid put is added to the writeBuffer by put(Put put). It will trigger a NoSuchColumnFamilyException once it goes to the region server.

          2) At some point, the buffer is flushed.

          3) When the invalid put is processed, an exception is thrown. The finally clause of flushCommits() removes all successful puts from the writebuffer list but the failed put remains at the top.

          4) Subsequent puts will add more entries to the write buffer but the first entry on the list is invalid so eventually every Put will throw an exception once the buffer limit .

          I don't see how the patch on this issue handles this. The invalid entries will be retried here over and over as well.

          A workaround with the current write buffering in HTable is for the client to call getWriteBuffer() and remove the entry at the head of the list manually.

          Show
          Andrew Purtell added a comment - I assume that write buffering of single operations will wrap the new multi calls. Currently there is a wart with respect to the write buffer. From a Trend dev team: When insert rows into one table by calling the method public synchronized void put(final Put put), if the column family of one row does not exist, the insert operation will failed and throw NoSuchColumnFamilyException.. We observed that all the following insert operation will fails even though all of them have valid column family. That is one exception of insert operation can cause failure of all the following insert operation. Their further analysis explains in detail the scenario, which I will summarize here: 1) An invalid put is added to the writeBuffer by put(Put put). It will trigger a NoSuchColumnFamilyException once it goes to the region server. 2) At some point, the buffer is flushed. 3) When the invalid put is processed, an exception is thrown. The finally clause of flushCommits() removes all successful puts from the writebuffer list but the failed put remains at the top. 4) Subsequent puts will add more entries to the write buffer but the first entry on the list is invalid so eventually every Put will throw an exception once the buffer limit . I don't see how the patch on this issue handles this. The invalid entries will be retried here over and over as well. A workaround with the current write buffering in HTable is for the client to call getWriteBuffer() and remove the entry at the head of the list manually.
          Hide
          Erik Holstad added a comment -

          @Andrew
          I haven't worked on this for a little while, but they way I was thinking it is that the failed inserts gets returned separately from the successful once and that only the failed are retried.
          If the code that you have looked at doesn't do that, it is wrong.

          About the write buffer I'm not sure how we want to do, since I think that we are gong to be able to mix calls, get, puts and deletes in a single multi call, so we have to decide if this is something that makes sense and in that case maybe not use the write buffer. But like I said earlier, havn't looked at the code recently, so don't exactly remember.

          Erik

          Show
          Erik Holstad added a comment - @Andrew I haven't worked on this for a little while, but they way I was thinking it is that the failed inserts gets returned separately from the successful once and that only the failed are retried. If the code that you have looked at doesn't do that, it is wrong. About the write buffer I'm not sure how we want to do, since I think that we are gong to be able to mix calls, get, puts and deletes in a single multi call, so we have to decide if this is something that makes sense and in that case maybe not use the write buffer. But like I said earlier, havn't looked at the code recently, so don't exactly remember. Erik
          Hide
          ryan rawson added a comment -

          Hey guys any update on this patch? Should I beta test it and see if I can shake out and hopefully fix some bugs?

          Show
          ryan rawson added a comment - Hey guys any update on this patch? Should I beta test it and see if I can shake out and hopefully fix some bugs?
          Hide
          Erik Holstad added a comment -

          @ Ryan!
          Sorry that I haven't been to active on this, but have had some other stuff to work on.
          But it would be great if you would test it out so I could get some feedback.

          Show
          Erik Holstad added a comment - @ Ryan! Sorry that I haven't been to active on this, but have had some other stuff to work on. But it would be great if you would test it out so I could get some feedback.
          Hide
          Marc Limotte added a comment -

          After some discussion with Jonathan, it seems that Erik is no longer actively working on this patch. I'd like to help out.

          So far, I've updated the patch, so that it applies to hbase-0.20.3, and I've expanded some of the unit tests. I have not yet implemented the other comments in this ticket and there's still some other clean up to do.

          Also, I haven't looked in detail at HBASE-2066, but need to think about how it impacts this issue. I'm not sure, yet, but I don't think it's a replacement of the MultiPut functionality here.

          Show
          Marc Limotte added a comment - After some discussion with Jonathan, it seems that Erik is no longer actively working on this patch. I'd like to help out. So far, I've updated the patch, so that it applies to hbase-0.20.3, and I've expanded some of the unit tests. I have not yet implemented the other comments in this ticket and there's still some other clean up to do. Also, I haven't looked in detail at HBASE-2066 , but need to think about how it impacts this issue. I'm not sure, yet, but I don't think it's a replacement of the MultiPut functionality here.
          Hide
          Erik Holstad added a comment -

          Marc, have at it, good luck and let me know if you have any questions.

          Erik

          Show
          Erik Holstad added a comment - Marc, have at it, good luck and let me know if you have any questions. Erik
          Hide
          ryan rawson added a comment -

          this won't be able to go into 0.20 for the same reason 2066 cant - it requires a version bump in the RPC. Even adding methods changes the on-wire protocol alas.

          Show
          ryan rawson added a comment - this won't be able to go into 0.20 for the same reason 2066 cant - it requires a version bump in the RPC. Even adding methods changes the on-wire protocol alas.
          Hide
          Karthik K added a comment -

          To optimize puts - avoid the expensive array copy , that is currently done due to the limitation of the RPC.

          Show
          Karthik K added a comment - To optimize puts - avoid the expensive array copy , that is currently done due to the limitation of the RPC.
          Hide
          Marc Limotte added a comment -

          Ryan,

          Good point. Maybe there is time to get it into 0.21?

          Show
          Marc Limotte added a comment - Ryan, Good point. Maybe there is time to get it into 0.21?
          Hide
          stack added a comment -

          Marc: I'd aim for 0.21, yes. Also, see hbase-2209 for more on what Kay Kay is on about.

          Show
          stack added a comment - Marc: I'd aim for 0.21, yes. Also, see hbase-2209 for more on what Kay Kay is on about.
          Hide
          Marc Limotte added a comment -

          Stack, thanks for the pointer on 2209.

          To shoot for 0.21, I should work against the trunk? Or is there a separate branch for what will be 0.21?

          Show
          Marc Limotte added a comment - Stack, thanks for the pointer on 2209. To shoot for 0.21, I should work against the trunk? Or is there a separate branch for what will be 0.21?
          Hide
          ryan rawson added a comment -

          you should work this against the branch for 0.20.4 - will will introduce HBASE-2219 to the branch and allow the addition of methods in patches from now on. So aim for the branch and we can get it in for 0.20.4 - sounds good?

          Show
          ryan rawson added a comment - you should work this against the branch for 0.20.4 - will will introduce HBASE-2219 to the branch and allow the addition of methods in patches from now on. So aim for the branch and we can get it in for 0.20.4 - sounds good?
          Hide
          Marc Limotte added a comment -

          This is a completely new implementation of HBASE-1845, based on the work Ryan did for MultiPut (HBASE-2066)

          I also applied a lot of the code from HBASE-2421, which had a lot of useful improvements.

          The unit tests are in "src/test/org/apache/hadoop/hbase/TestMultiParallel.java", which also replaces "src/test/org/apache/hadoop/hbase/TestMultiParallelPut.java". it includes a test with a forced failure of one RegionServer.

          I've deprecated the old batch methods and made the implementations of those method call this new one.

          Show
          Marc Limotte added a comment - This is a completely new implementation of HBASE-1845 , based on the work Ryan did for MultiPut ( HBASE-2066 ) I also applied a lot of the code from HBASE-2421 , which had a lot of useful improvements. The unit tests are in "src/test/org/apache/hadoop/hbase/TestMultiParallel.java", which also replaces "src/test/org/apache/hadoop/hbase/TestMultiParallelPut.java". it includes a test with a forced failure of one RegionServer. I've deprecated the old batch methods and made the implementations of those method call this new one.
          Hide
          HBase Review Board added a comment -

          Message from: "Marc Limotte" <mslimotte@gmail.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/151/
          -----------------------------------------------------------

          Review request for hbase.

          Summary
          -------

          Updated the patch to work on the trunk. New unit tests are in TestMultiParallel.java, which replaces TestMultiParallelPut.java.

          This addresses bug HBASE-1845.

          Diffs


          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Action.java PRE-CREATION
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Get.java 951973
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java 951973
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 951973
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 951973
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiAction.java PRE-CREATION
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiPut.java 951973
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiPutResponse.java 951973
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiResponse.java PRE-CREATION
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Row.java 951973
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 951973
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 951973
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 951973
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestMultiParallel.java PRE-CREATION
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestMultiParallelPut.java 951973

          Diff: http://review.hbase.org/r/151/diff

          Testing
          -------

          Thanks,

          Marc

          Show
          HBase Review Board added a comment - Message from: "Marc Limotte" <mslimotte@gmail.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/151/ ----------------------------------------------------------- Review request for hbase. Summary ------- Updated the patch to work on the trunk. New unit tests are in TestMultiParallel.java, which replaces TestMultiParallelPut.java. This addresses bug HBASE-1845 . Diffs http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Action.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Get.java 951973 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java 951973 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 951973 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java 951973 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiAction.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiPut.java 951973 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiPutResponse.java 951973 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiResponse.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Row.java 951973 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 951973 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 951973 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 951973 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestMultiParallel.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestMultiParallelPut.java 951973 Diff: http://review.hbase.org/r/151/diff Testing ------- Thanks, Marc
          Hide
          Jeff Hammerbacher added a comment -

          Hey,

          Forgive my ignorance, but how is a MultiGet different from a Scan with a filter? I suppose I understand from a query execution perspective, but it feels a bit odd to have different client APIs for grabbing multiple rows.

          Thanks,
          Jeff

          Show
          Jeff Hammerbacher added a comment - Hey, Forgive my ignorance, but how is a MultiGet different from a Scan with a filter? I suppose I understand from a query execution perspective, but it feels a bit odd to have different client APIs for grabbing multiple rows. Thanks, Jeff
          Hide
          ryan rawson added a comment -

          multi-get is more of a parallel random get, not like a scan at all.

          On Mon, Jun 7, 2010 at 11:14 PM, Jeff Hammerbacher (JIRA)

          Show
          ryan rawson added a comment - multi-get is more of a parallel random get, not like a scan at all. On Mon, Jun 7, 2010 at 11:14 PM, Jeff Hammerbacher (JIRA)
          Hide
          Jeff Hammerbacher added a comment -

          As I mentioned, I see the difference in execution strategies, but it would be nice to have HBase take care of that for you depending on the selectivity of the Scan/MultiGet. As a concrete example, a FilterList full of RowFilters looks an awful lot like a MultiGet.

          Show
          Jeff Hammerbacher added a comment - As I mentioned, I see the difference in execution strategies, but it would be nice to have HBase take care of that for you depending on the selectivity of the Scan/MultiGet. As a concrete example, a FilterList full of RowFilters looks an awful lot like a MultiGet.
          Hide
          Marc Limotte added a comment -

          I think you're right that you could accomplish some of the same things with the Scan/Filter semantics. Multi has a couple of additional capabilities.

          Scan is limited to one table. Multi can do Gets from lots of different tables. This applies to families or columns, too.

          Second, Multi lets you mix and match Get, Put and Delete. It's not clear to me how often that will happen, but it's there.

          Show
          Marc Limotte added a comment - I think you're right that you could accomplish some of the same things with the Scan/Filter semantics. Multi has a couple of additional capabilities. Scan is limited to one table. Multi can do Gets from lots of different tables. This applies to families or columns, too. Second, Multi lets you mix and match Get, Put and Delete. It's not clear to me how often that will happen, but it's there.
          Hide
          stack added a comment -

          .bq ...but how is a MultiGet different from a Scan with a filter? I suppose I understand from a query execution perspective, but it feels a bit odd to have different client APIs for grabbing multiple rows.

          MultiGet ia parallel fetching of many random rows. For small numbers (hundreds/thousands?) itt should run faster than the (usually) single-threaded full table scan plus filter that only returns those rows that pass the filter. There is an inflection point at which the scan becomes faster than MultiGet but you'd have be returning a good percentage of the table for that to be the case. Also, a filter to return some random set of the table content would have to carry all rows I believe, unless there is a pattern to the wanted row keys that you can extract; it'd be pretty fat w/ payload.

          (HBASE-1935 is scanning in parallel. It has a patch attached).

          Show
          stack added a comment - .bq ...but how is a MultiGet different from a Scan with a filter? I suppose I understand from a query execution perspective, but it feels a bit odd to have different client APIs for grabbing multiple rows. MultiGet ia parallel fetching of many random rows. For small numbers (hundreds/thousands?) itt should run faster than the (usually) single-threaded full table scan plus filter that only returns those rows that pass the filter. There is an inflection point at which the scan becomes faster than MultiGet but you'd have be returning a good percentage of the table for that to be the case. Also, a filter to return some random set of the table content would have to carry all rows I believe, unless there is a pattern to the wanted row keys that you can extract; it'd be pretty fat w/ payload. ( HBASE-1935 is scanning in parallel. It has a patch attached).
          Hide
          Jonathan Gray added a comment -

          A parallel scan w/ filters could be approaching the same performance as a MultiGet, but after we implement some performance features we still don't have. As of now, we are not aggressive with re-seeking during a scan, we basically will always do full scans of all blocks even if we want to jump to the next row for example, we'll keep reading through the previous row. See HBASE-1517 and stuff like HBASE-2517 and HBASE-2450.

          The second requirement would be adding more calls into filters to allow them to push-down seeking information. For example, after every KV passed into the filter, there could be a call to ask if the scan should re-seek (return null if it needs to see every KV still, or return the KV to seek to if for example it knows it's done with the current row and it knows the next row it wants).

          So I think we will eventually get to the point where MultiGet and parallel scans with filters become virtually the same thing... but it's going to be a while. I think keeping Gets around and adding MultiGet gives us an easy target for optimization and won't require adding further complexity to filters just yet.

          Show
          Jonathan Gray added a comment - A parallel scan w/ filters could be approaching the same performance as a MultiGet, but after we implement some performance features we still don't have. As of now, we are not aggressive with re-seeking during a scan, we basically will always do full scans of all blocks even if we want to jump to the next row for example, we'll keep reading through the previous row. See HBASE-1517 and stuff like HBASE-2517 and HBASE-2450 . The second requirement would be adding more calls into filters to allow them to push-down seeking information. For example, after every KV passed into the filter, there could be a call to ask if the scan should re-seek (return null if it needs to see every KV still, or return the KV to seek to if for example it knows it's done with the current row and it knows the next row it wants). So I think we will eventually get to the point where MultiGet and parallel scans with filters become virtually the same thing... but it's going to be a while. I think keeping Gets around and adding MultiGet gives us an easy target for optimization and won't require adding further complexity to filters just yet.
          Hide
          Benoit Sigoure added a comment -

          Hello,
          I just became aware of this issue. I haven't read all the comments and haven't looked at the patches yet, but I'd like to draw your attention to HBASE-2898 and so you can make sure that whatever you do, you don't reproduce this issue. It'd be nice if this issue solved HBASE-2898 as a side-effect of rewriting multiPut as part of the multi-everything code. I'll take a look at the code proposed here when time permits.

          Show
          Benoit Sigoure added a comment - Hello, I just became aware of this issue. I haven't read all the comments and haven't looked at the patches yet, but I'd like to draw your attention to HBASE-2898 and so you can make sure that whatever you do, you don't reproduce this issue. It'd be nice if this issue solved HBASE-2898 as a side-effect of rewriting multiPut as part of the multi-everything code. I'll take a look at the code proposed here when time permits.
          Hide
          HBase Review Board added a comment -

          Message from: "Benoit Sigoure" <tsunanet@gmail.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/151/#review905
          -----------------------------------------------------------

          -1 overall.

          The good thing about this change is that it addresses a ~third of HBASE-2898.
          The very bad thing about this change is that it generalizes the disease that started with MultiPut, which is described in HBASE-2898. I really don't want HBASE-2898 to spread like cancer. I believe multi-everything is the way to go, but we need to change more things to also cure HBASE-2898 instead of spreading it.

          The MultiResponse does the right thing in the sense that it gives us the result of each individual Action, instead of what the old MultiPutResponse does which is to give us the "index" of the first failed Put. This is necessary but not sufficient. When there's a failure for a particular Action, the client needs to know what the failure was for this particular Action. If it's a NoSuchColumnFamilyException, then the client should bail out immediately and escalate the exception to the user. Other exceptions (like NotServingRegionException) can be handled by the client. Maybe instead of storing a Result per Action, we could store an Object with is either a Result or an Exception.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Action.java
          <http://review.cloudera.org/r/151/#comment2944>

          Missing copyright header.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Action.java
          <http://review.cloudera.org/r/151/#comment2946>

          Missing javadoc.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Action.java
          <http://review.cloudera.org/r/151/#comment2945>

          Why is the `Row' called `action'?
          edit: Ah OK I get it, `Row' is a very poorly named interface. Ignore my comment.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Action.java
          <http://review.cloudera.org/r/151/#comment2943>

          Kill ALL trailing whitespaces in all the files.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Action.java
          <http://review.cloudera.org/r/151/#comment2973>

          I don't like the fact that you have to store the original order of the `Action' instances. I understand you need to do this because the server sorts the actions, but I'd be in favor of doing the sorting on the client side and removing this unnecessary complication.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Get.java
          <http://review.cloudera.org/r/151/#comment2947>

          why is the argument named `del'?

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
          <http://review.cloudera.org/r/151/#comment2971>

          Please move the @return after all the @param.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
          <http://review.cloudera.org/r/151/#comment2948>

          Please use the same style as in the rest of the file.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
          <http://review.cloudera.org/r/151/#comment2969>

          I don't see the point in doing that.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
          <http://review.cloudera.org/r/151/#comment2970>

          No spaces before the `[]'.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment2949>

          No spaces inside parentheses. Stick to the existing coding style.
          It doesn't seem necessary to cast `list' to a `List'.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment2950>

          This is not an IOException, it's an IllegalArgumentException.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment2951>

          Instead of doing that, give `list' in argument to the constructor of ArrayList, so that the list can be sized and constructed properly immediately. This avoids an unnecessary array resize.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment2952>

          Unnecessary parentheses on the RHS.
          Also this variable can be declared final.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment2953>

          This is bad. Don't do that.
          Please read http://www.ibm.com/developerworks/java/library/j-jtp05236.html

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment2954>

          Missing spaces after the commas.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment2955>

          Missing space after the comma. Please fix all of those.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment2956>

          Above you used just `Entry', here you use `Map.Entry'. Stick to one or the other.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment2957>

          Missing spaces around the equal sign.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment2958>

          This is also improperly handling InterruptedException. I realize the code was already wrong but since you're changing it, you may as well fix it.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment2972>

          Remove the unnecessary cast to `DoNotRetryIOException'.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment2959>

          Missing spaces around `=', `<' and after the last `;'.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment2960>

          Remove the extra spaces inside the parentheses and the unnecessary cast to a plain `List'.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
          <http://review.cloudera.org/r/151/#comment2961>

          Thanks for writing that javadoc.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiAction.java
          <http://review.cloudera.org/r/151/#comment2962>

          Technically, users won't use this class, so it doesn't need to be made `public'. Also, it could be marked as `final'.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiAction.java
          <http://review.cloudera.org/r/151/#comment2963>

          Bad English: "with the attempting to locate"

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiPut.java
          <http://review.cloudera.org/r/151/#comment2964>

          When adding a deprecation warning, is it possible in Java to specify an arbitrary message to explain what should be used instead? If not, then please at least mention that in the javadoc.
          This also holds true for MultiPutResponse below.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          <http://review.cloudera.org/r/151/#comment2965>

          Poorly named variable. It's not a row, it's an action...

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          <http://review.cloudera.org/r/151/#comment2966>

          The fact that you have to introspect the type in order to know what to do makes me feel uneasy. It shows that you're using the wrong interface for what you're trying to achieve.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          <http://review.cloudera.org/r/151/#comment2967>

          Don't throw an exception without a message describing why the exception was thrown.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          <http://review.cloudera.org/r/151/#comment2968>

          Remove this unnecessary line you added.

          • Benoit
          Show
          HBase Review Board added a comment - Message from: "Benoit Sigoure" <tsunanet@gmail.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/151/#review905 ----------------------------------------------------------- -1 overall. The good thing about this change is that it addresses a ~third of HBASE-2898 . The very bad thing about this change is that it generalizes the disease that started with MultiPut, which is described in HBASE-2898 . I really don't want HBASE-2898 to spread like cancer. I believe multi-everything is the way to go, but we need to change more things to also cure HBASE-2898 instead of spreading it. The MultiResponse does the right thing in the sense that it gives us the result of each individual Action, instead of what the old MultiPutResponse does which is to give us the "index" of the first failed Put. This is necessary but not sufficient. When there's a failure for a particular Action, the client needs to know what the failure was for this particular Action. If it's a NoSuchColumnFamilyException, then the client should bail out immediately and escalate the exception to the user. Other exceptions (like NotServingRegionException) can be handled by the client. Maybe instead of storing a Result per Action, we could store an Object with is either a Result or an Exception. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Action.java < http://review.cloudera.org/r/151/#comment2944 > Missing copyright header. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Action.java < http://review.cloudera.org/r/151/#comment2946 > Missing javadoc. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Action.java < http://review.cloudera.org/r/151/#comment2945 > Why is the `Row' called `action'? edit: Ah OK I get it, `Row' is a very poorly named interface. Ignore my comment. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Action.java < http://review.cloudera.org/r/151/#comment2943 > Kill ALL trailing whitespaces in all the files. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Action.java < http://review.cloudera.org/r/151/#comment2973 > I don't like the fact that you have to store the original order of the `Action' instances. I understand you need to do this because the server sorts the actions, but I'd be in favor of doing the sorting on the client side and removing this unnecessary complication. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Get.java < http://review.cloudera.org/r/151/#comment2947 > why is the argument named `del'? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java < http://review.cloudera.org/r/151/#comment2971 > Please move the @return after all the @param. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java < http://review.cloudera.org/r/151/#comment2948 > Please use the same style as in the rest of the file. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java < http://review.cloudera.org/r/151/#comment2969 > I don't see the point in doing that. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java < http://review.cloudera.org/r/151/#comment2970 > No spaces before the `[]'. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment2949 > No spaces inside parentheses. Stick to the existing coding style. It doesn't seem necessary to cast `list' to a `List'. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment2950 > This is not an IOException, it's an IllegalArgumentException. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment2951 > Instead of doing that, give `list' in argument to the constructor of ArrayList, so that the list can be sized and constructed properly immediately. This avoids an unnecessary array resize. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment2952 > Unnecessary parentheses on the RHS. Also this variable can be declared final. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment2953 > This is bad. Don't do that. Please read http://www.ibm.com/developerworks/java/library/j-jtp05236.html http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment2954 > Missing spaces after the commas. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment2955 > Missing space after the comma. Please fix all of those. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment2956 > Above you used just `Entry', here you use `Map.Entry'. Stick to one or the other. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment2957 > Missing spaces around the equal sign. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment2958 > This is also improperly handling InterruptedException. I realize the code was already wrong but since you're changing it, you may as well fix it. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment2972 > Remove the unnecessary cast to `DoNotRetryIOException'. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment2959 > Missing spaces around `=', `<' and after the last `;'. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment2960 > Remove the extra spaces inside the parentheses and the unnecessary cast to a plain `List'. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java < http://review.cloudera.org/r/151/#comment2961 > Thanks for writing that javadoc. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiAction.java < http://review.cloudera.org/r/151/#comment2962 > Technically, users won't use this class, so it doesn't need to be made `public'. Also, it could be marked as `final'. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiAction.java < http://review.cloudera.org/r/151/#comment2963 > Bad English: "with the attempting to locate" http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiPut.java < http://review.cloudera.org/r/151/#comment2964 > When adding a deprecation warning, is it possible in Java to specify an arbitrary message to explain what should be used instead? If not, then please at least mention that in the javadoc. This also holds true for MultiPutResponse below. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java < http://review.cloudera.org/r/151/#comment2965 > Poorly named variable. It's not a row, it's an action... http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java < http://review.cloudera.org/r/151/#comment2966 > The fact that you have to introspect the type in order to know what to do makes me feel uneasy. It shows that you're using the wrong interface for what you're trying to achieve. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java < http://review.cloudera.org/r/151/#comment2967 > Don't throw an exception without a message describing why the exception was thrown. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java < http://review.cloudera.org/r/151/#comment2968 > Remove this unnecessary line you added. Benoit
          Hide
          HBase Review Board added a comment -

          Message from: "Ted Yu" <ted_yu@yahoo.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/151/#review972
          -----------------------------------------------------------

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment3162>

          Should call get(long timeout, TimeUnit unit) so that we can process MultiResponse from other RS
          Should we introduce timeout for step 3?

          • Ted
          Show
          HBase Review Board added a comment - Message from: "Ted Yu" <ted_yu@yahoo.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/151/#review972 ----------------------------------------------------------- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment3162 > Should call get(long timeout, TimeUnit unit) so that we can process MultiResponse from other RS Should we introduce timeout for step 3? Ted
          Hide
          Marc Limotte added a comment -

          Updated patch hbase-1845-trunk.patch. Addresses the review board comments from benoit and Ted. And updated to work with the current trunk.

          Show
          Marc Limotte added a comment - Updated patch hbase-1845-trunk.patch. Addresses the review board comments from benoit and Ted. And updated to work with the current trunk.
          Hide
          HBase Review Board added a comment -

          Message from: "Marc Limotte" <mslimotte@gmail.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/151/#review1057
          -----------------------------------------------------------

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Get.java
          <http://review.cloudera.org/r/151/#comment3362>

          oversight. fixed now.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
          <http://review.cloudera.org/r/151/#comment3364>

          fixed

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
          <http://review.cloudera.org/r/151/#comment3363>

          fixed

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
          <http://review.cloudera.org/r/151/#comment3367>

          i made the return type void instead.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
          <http://review.cloudera.org/r/151/#comment3365>

          fixed

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment3352>

          Fixing spaces.

          Compile error without (List) cast: method processBatch(List<Row> ...) is not applicable for argument type ArrayList<Put>.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment3353>

          fixed

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment3354>

          fixed

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment3368>

          I think the parens aid readability.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment3355>

          I have abort and maintain the thread interupted status now.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment3369>

          fixed

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment3356>

          I added a 1000 ms timeout for the future.get. Hope that is a reasonable time limit. I still allow for a retry in this case. Not confident this is the right behavior.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment3357>

          fixed. Sticking with Entry.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment3358>

          fixed

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment3359>

          fixed, as above.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment3360>

          Cast is necessary. Otherwise method would have to declare itself as Throws Throwable (the return type of e.getCause()).

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment3361>

          fixed

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          <http://review.cloudera.org/r/151/#comment3370>

          spaces fixed

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiAction.java
          <http://review.cloudera.org/r/151/#comment3371>

          marked final, but needs to be public for use in HRegionInterface.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiAction.java
          <http://review.cloudera.org/r/151/#comment3372>

          thanks

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiPut.java
          <http://review.cloudera.org/r/151/#comment3373>

          fixed - message can go after @deprecated in javadoc

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          <http://review.cloudera.org/r/151/#comment3374>

          fixed

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          <http://review.cloudera.org/r/151/#comment3375>

          Could refactor to add a "doAction()" method to each Action delegate?

          • Marc
          Show
          HBase Review Board added a comment - Message from: "Marc Limotte" <mslimotte@gmail.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/151/#review1057 ----------------------------------------------------------- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Get.java < http://review.cloudera.org/r/151/#comment3362 > oversight. fixed now. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java < http://review.cloudera.org/r/151/#comment3364 > fixed http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java < http://review.cloudera.org/r/151/#comment3363 > fixed http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java < http://review.cloudera.org/r/151/#comment3367 > i made the return type void instead. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java < http://review.cloudera.org/r/151/#comment3365 > fixed http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment3352 > Fixing spaces. Compile error without (List) cast: method processBatch(List<Row> ...) is not applicable for argument type ArrayList<Put>. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment3353 > fixed http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment3354 > fixed http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment3368 > I think the parens aid readability. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment3355 > I have abort and maintain the thread interupted status now. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment3369 > fixed http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment3356 > I added a 1000 ms timeout for the future.get. Hope that is a reasonable time limit. I still allow for a retry in this case. Not confident this is the right behavior. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment3357 > fixed. Sticking with Entry. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment3358 > fixed http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment3359 > fixed, as above. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment3360 > Cast is necessary. Otherwise method would have to declare itself as Throws Throwable (the return type of e.getCause()). http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment3361 > fixed http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java < http://review.cloudera.org/r/151/#comment3370 > spaces fixed http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiAction.java < http://review.cloudera.org/r/151/#comment3371 > marked final, but needs to be public for use in HRegionInterface. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiAction.java < http://review.cloudera.org/r/151/#comment3372 > thanks http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MultiPut.java < http://review.cloudera.org/r/151/#comment3373 > fixed - message can go after @deprecated in javadoc http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java < http://review.cloudera.org/r/151/#comment3374 > fixed http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java < http://review.cloudera.org/r/151/#comment3375 > Could refactor to add a "doAction()" method to each Action delegate? Marc
          Hide
          Marc Limotte added a comment -

          Note. This patch doesn't address HBASE-2898 yet. Although I agree that HBASE-2898 is a valid concern.

          Show
          Marc Limotte added a comment - Note. This patch doesn't address HBASE-2898 yet. Although I agree that HBASE-2898 is a valid concern.
          Hide
          ryan rawson added a comment -

          committed this to trunk. Thanks for the contribution Marc!

          Show
          ryan rawson added a comment - committed this to trunk. Thanks for the contribution Marc!
          Hide
          HBase Review Board added a comment -

          Message from: "Ryan Rawson" <ryanobjc@gmail.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/151/#review1064
          -----------------------------------------------------------

          Ship it!

          • Ryan
          Show
          HBase Review Board added a comment - Message from: "Ryan Rawson" <ryanobjc@gmail.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/151/#review1064 ----------------------------------------------------------- Ship it! Ryan
          Hide
          Jonathan Gray added a comment -

          Thanks for the great work, Marc!

          Show
          Jonathan Gray added a comment - Thanks for the great work, Marc!
          Hide
          stack added a comment -

          @Marc Thanks for persevering... how long was it since you put up your first patch?

          Show
          stack added a comment - @Marc Thanks for persevering... how long was it since you put up your first patch?
          Hide
          Marc Limotte added a comment -

          Almost 9 months, Stack. Do I get a tshirt?

          And thanks to everyone for their support.

          Show
          Marc Limotte added a comment - Almost 9 months, Stack. Do I get a tshirt? And thanks to everyone for their support.
          Hide
          stack added a comment -

          Do I get a tshirt?

          Wrong project (<- injoke).

          If we do t-shirts, you first in line.

          Show
          stack added a comment - Do I get a tshirt? Wrong project (<- injoke). If we do t-shirts, you first in line.
          Hide
          Lars Francke added a comment -

          It seems as if the HBASE-2692 commit accidentally backed this out again: http://github.com/apache/hbase/commit/be4c32c28d22ddaad78cd4ced9019ca0b3f5c83e

          Show
          Lars Francke added a comment - It seems as if the HBASE-2692 commit accidentally backed this out again: http://github.com/apache/hbase/commit/be4c32c28d22ddaad78cd4ced9019ca0b3f5c83e
          Hide
          stack added a comment -

          Looks like I messed up this commit w/ master rewrite patch; reopening till I fix it.

          Show
          stack added a comment - Looks like I messed up this commit w/ master rewrite patch; reopening till I fix it.
          Hide
          stack added a comment -

          Sorry about that Lars (and Marc). Reapplied.

          Show
          stack added a comment - Sorry about that Lars (and Marc). Reapplied.
          Hide
          stack added a comment -

          Closing. I reapplied the pieces of the patch squashed by new master commit a while back.

          Show
          stack added a comment - Closing. I reapplied the pieces of the patch squashed by new master commit a while back.
          Hide
          ryan rawson added a comment -

          an addition to this issue, the as-is unit test was testing the order of mixed-operations, in that they were applied in the order they were provided. That is not realistic, especially if we want to implement HBASE-2985, which does the Puts last in 1 batch. The regionserver should not be constrained to the most optimal order of operations by the API.

          Show
          ryan rawson added a comment - an addition to this issue, the as-is unit test was testing the order of mixed-operations, in that they were applied in the order they were provided. That is not realistic, especially if we want to implement HBASE-2985 , which does the Puts last in 1 batch. The regionserver should not be constrained to the most optimal order of operations by the API.
          Hide
          stack added a comment -

          @Ryan Want to open new issue to address?

          Show
          stack added a comment - @Ryan Want to open new issue to address?

            People

            • Assignee:
              Marc Limotte
              Reporter:
              Erik Holstad
            • Votes:
              8 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development