HBase
  1. HBase
  2. HBASE-3082

For ICV gets, first look in MemStore before reading StoreFiles

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.0
    • Component/s: regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      For incrementColumnValue operations, it is possible to check MemStore for the column being incremented without sacrificing correctness. If the column is not found in MemStore, we would then have to do a normal Get that opens/checks all StoreFiles for the given Store.

      In practice, this makes increment operations significantly faster for recently/frequently incremented columns.

        Issue Links

          Activity

          Hide
          ryan rawson added a comment -

          with mixed puts and ICVs this gets tricky.

          how to do w/o a different read path piece of code?

          Show
          ryan rawson added a comment - with mixed puts and ICVs this gets tricky. how to do w/o a different read path piece of code?
          Hide
          Jonathan Gray added a comment -

          The easy was is to add a flag to get which will first only open the scanner against the memstore/snapshot, and then if not found, open the scanner again across everything.

          The harder way would be to reintroduce the Get code we ripped out (hoping not to do this).

          The first way has been implemented by Prakash and we saw big gains from it.

          Can you explain more what you mean by mixed puts/ICVs? Would those be mixed on the same column?

          Show
          Jonathan Gray added a comment - The easy was is to add a flag to get which will first only open the scanner against the memstore/snapshot, and then if not found, open the scanner again across everything. The harder way would be to reintroduce the Get code we ripped out (hoping not to do this). The first way has been implemented by Prakash and we saw big gains from it. Can you explain more what you mean by mixed puts/ICVs? Would those be mixed on the same column?
          Hide
          HBase Review Board added a comment -

          Message from: thekhemani@facebook.com

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/1053/
          -----------------------------------------------------------

          Review request for hbase and Jonathan Gray.

          Summary
          -------

          Ensure that during incrementColumnValue() the scan triggered by the get() does an early exit if it finds the KV in the memstore.

          This addresses bug HBASE-3082.
          http://issues.apache.org/jira/browse/HBASE-3082

          Diffs


          src/main/java/org/apache/hadoop/hbase/regionserver/ControlledScan.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 0e3940c
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 4775fc8

          Diff: http://review.cloudera.org/r/1053/diff

          Testing
          -------

          I have been testing it on my cluster. No unit testing yet.

          Thanks,

          khemani

          Show
          HBase Review Board added a comment - Message from: thekhemani@facebook.com ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1053/ ----------------------------------------------------------- Review request for hbase and Jonathan Gray. Summary ------- Ensure that during incrementColumnValue() the scan triggered by the get() does an early exit if it finds the KV in the memstore. This addresses bug HBASE-3082 . http://issues.apache.org/jira/browse/HBASE-3082 Diffs src/main/java/org/apache/hadoop/hbase/regionserver/ControlledScan.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 0e3940c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 4775fc8 Diff: http://review.cloudera.org/r/1053/diff Testing ------- I have been testing it on my cluster. No unit testing yet. Thanks, khemani
          Hide
          HBase Review Board added a comment -

          Message from: "Jonathan Gray" <jgray@apache.org>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/1053/#review1585
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/ControlledScan.java
          <http://review.cloudera.org/r/1053/#comment5382>

          Need to add apache licenses

          src/main/java/org/apache/hadoop/hbase/regionserver/ControlledScan.java
          <http://review.cloudera.org/r/1053/#comment5383>

          Maybe a little more description here. Like, why do we need this.

          Also, is ControlledScan the best name for this class? Not clear to me that this is "controlled" and thus the other scans are "uncontrolled"?

          If this was to be used for more later and even for this use, it's really additional configuration/options for Scans. So is an InternalScan or ModifiedScan?

          src/main/java/org/apache/hadoop/hbase/regionserver/ControlledScan.java
          <http://review.cloudera.org/r/1053/#comment5385>

          Rather than these two public booleans, we should have getters/setters.

          And even rather than just straight set/get on each boolean, these two booleans are usually one or the other, right? So I think it would be better to hide implementation detail and instead expose methods like setMemStoreOnly() and setFilesOnly() which would deal with the two booleans appropriately. Then you'd have get methods like checkMemStore() and checkFiles() or something like that.

          src/main/java/org/apache/hadoop/hbase/regionserver/ControlledScan.java
          <http://review.cloudera.org/r/1053/#comment5384>

          Whitespace

          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          <http://review.cloudera.org/r/1053/#comment5386>

          Need to cleanup whitespace here.

          Maybe a little bit of javadoc for get_last_increment which describes what this does (and why it's still "correct" for increments)

          • Jonathan
          Show
          HBase Review Board added a comment - Message from: "Jonathan Gray" <jgray@apache.org> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1053/#review1585 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/ControlledScan.java < http://review.cloudera.org/r/1053/#comment5382 > Need to add apache licenses src/main/java/org/apache/hadoop/hbase/regionserver/ControlledScan.java < http://review.cloudera.org/r/1053/#comment5383 > Maybe a little more description here. Like, why do we need this. Also, is ControlledScan the best name for this class? Not clear to me that this is "controlled" and thus the other scans are "uncontrolled"? If this was to be used for more later and even for this use, it's really additional configuration/options for Scans. So is an InternalScan or ModifiedScan? src/main/java/org/apache/hadoop/hbase/regionserver/ControlledScan.java < http://review.cloudera.org/r/1053/#comment5385 > Rather than these two public booleans, we should have getters/setters. And even rather than just straight set/get on each boolean, these two booleans are usually one or the other, right? So I think it would be better to hide implementation detail and instead expose methods like setMemStoreOnly() and setFilesOnly() which would deal with the two booleans appropriately. Then you'd have get methods like checkMemStore() and checkFiles() or something like that. src/main/java/org/apache/hadoop/hbase/regionserver/ControlledScan.java < http://review.cloudera.org/r/1053/#comment5384 > Whitespace src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java < http://review.cloudera.org/r/1053/#comment5386 > Need to cleanup whitespace here. Maybe a little bit of javadoc for get_last_increment which describes what this does (and why it's still "correct" for increments) Jonathan
          Hide
          HBase Review Board added a comment -

          Message from: "Benoit Sigoure" <tsunanet@gmail.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/1053/#review1586
          -----------------------------------------------------------

          Just some minor comments.

          src/main/java/org/apache/hadoop/hbase/regionserver/ControlledScan.java
          <http://review.cloudera.org/r/1053/#comment5387>

          This class doesn't need to be public, make it package-private.

          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          <http://review.cloudera.org/r/1053/#comment5389>

          The style used in HBase mandates that this method be named getLastIncrement.

          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          <http://review.cloudera.org/r/1053/#comment5388>

          A `//'-style comment would be better here

          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          <http://review.cloudera.org/r/1053/#comment5390>

          use if (!results.isEmpty())

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
          <http://review.cloudera.org/r/1053/#comment5391>

          Instead of two nested `if', you could combine both conditions with `&&'

          • Benoit
          Show
          HBase Review Board added a comment - Message from: "Benoit Sigoure" <tsunanet@gmail.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1053/#review1586 ----------------------------------------------------------- Just some minor comments. src/main/java/org/apache/hadoop/hbase/regionserver/ControlledScan.java < http://review.cloudera.org/r/1053/#comment5387 > This class doesn't need to be public, make it package-private. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java < http://review.cloudera.org/r/1053/#comment5389 > The style used in HBase mandates that this method be named getLastIncrement. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java < http://review.cloudera.org/r/1053/#comment5388 > A `//'-style comment would be better here src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java < http://review.cloudera.org/r/1053/#comment5390 > use if (!results.isEmpty()) src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java < http://review.cloudera.org/r/1053/#comment5391 > Instead of two nested `if', you could combine both conditions with `&&' Benoit
          Hide
          HBase Review Board added a comment -

          Message from: thekhemani@facebook.com

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/1053/
          -----------------------------------------------------------

          (Updated 2010-10-20 21:31:17.831889)

          Review request for hbase and Jonathan Gray.

          Changes
          -------

          I think I have taken care of all the review feedback

          Summary
          -------

          Ensure that during incrementColumnValue() the scan triggered by the get() does an early exit if it finds the KV in the memstore.

          This addresses bug HBASE-3082.
          http://issues.apache.org/jira/browse/HBASE-3082

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 0e3940c
          src/main/java/org/apache/hadoop/hbase/regionserver/InternalScan.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 4775fc8

          Diff: http://review.cloudera.org/r/1053/diff

          Testing
          -------

          I have been testing it on my cluster. No unit testing yet.

          Thanks,

          khemani

          Show
          HBase Review Board added a comment - Message from: thekhemani@facebook.com ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1053/ ----------------------------------------------------------- (Updated 2010-10-20 21:31:17.831889) Review request for hbase and Jonathan Gray. Changes ------- I think I have taken care of all the review feedback Summary ------- Ensure that during incrementColumnValue() the scan triggered by the get() does an early exit if it finds the KV in the memstore. This addresses bug HBASE-3082 . http://issues.apache.org/jira/browse/HBASE-3082 Diffs (updated) src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 0e3940c src/main/java/org/apache/hadoop/hbase/regionserver/InternalScan.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 4775fc8 Diff: http://review.cloudera.org/r/1053/diff Testing ------- I have been testing it on my cluster. No unit testing yet. Thanks, khemani
          Hide
          HBase Review Board added a comment -

          Message from: "Ryan Rawson" <ryanobjc@gmail.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/1053/#review1596
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/InternalScan.java
          <http://review.cloudera.org/r/1053/#comment5396>

          Why not put these into the base class? Why create an entirely new class just to hold these?

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
          <http://review.cloudera.org/r/1053/#comment5397>

          Would this code be cleaner if we didnt have to use instanceof?

          • Ryan
          Show
          HBase Review Board added a comment - Message from: "Ryan Rawson" <ryanobjc@gmail.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1053/#review1596 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/InternalScan.java < http://review.cloudera.org/r/1053/#comment5396 > Why not put these into the base class? Why create an entirely new class just to hold these? src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java < http://review.cloudera.org/r/1053/#comment5397 > Would this code be cleaner if we didnt have to use instanceof? Ryan
          Hide
          HBase Review Board added a comment -

          Message from: thekhemani@facebook.com

          On 2010-10-20 21:41:23, Ryan Rawson wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/InternalScan.java, line 45

          > <http://review.cloudera.org/r/1053/diff/2/?file=15062#file15062line45>

          >

          > Why not put these into the base class? Why create an entirely new class just to hold these?

          It is not added to the base class so as not to alter the public Scan API.

          On 2010-10-20 21:41:23, Ryan Rawson wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java, line 155

          > <http://review.cloudera.org/r/1053/diff/2/?file=15063#file15063line155>

          >

          > Would this code be cleaner if we didnt have to use instanceof?

          yes, i agree it will be cleaner w/o instanceof. But in that case we will have to pass the flags all the way from HRegion to StoreScanner. A number of signatures will have to change.

          Another approach could be to only instantiate InternalScan in the server side. But again that is quite a bit of code change.

          Having an internal-scan should be useful. After all, a scan touches a lot of data and certain other tasks can potentially be piggybacked on a scan.

          • khemani

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/1053/#review1596
          -----------------------------------------------------------

          Show
          HBase Review Board added a comment - Message from: thekhemani@facebook.com On 2010-10-20 21:41:23, Ryan Rawson wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/InternalScan.java, line 45 > < http://review.cloudera.org/r/1053/diff/2/?file=15062#file15062line45 > > > Why not put these into the base class? Why create an entirely new class just to hold these? It is not added to the base class so as not to alter the public Scan API. On 2010-10-20 21:41:23, Ryan Rawson wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java, line 155 > < http://review.cloudera.org/r/1053/diff/2/?file=15063#file15063line155 > > > Would this code be cleaner if we didnt have to use instanceof? yes, i agree it will be cleaner w/o instanceof. But in that case we will have to pass the flags all the way from HRegion to StoreScanner. A number of signatures will have to change. Another approach could be to only instantiate InternalScan in the server side. But again that is quite a bit of code change. Having an internal-scan should be useful. After all, a scan touches a lot of data and certain other tasks can potentially be piggybacked on a scan. khemani ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1053/#review1596 -----------------------------------------------------------
          Hide
          HBase Review Board added a comment -

          Message from: "Ryan Rawson" <ryanobjc@gmail.com>

          On 2010-10-20 21:41:23, Ryan Rawson wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java, line 155

          > <http://review.cloudera.org/r/1053/diff/2/?file=15063#file15063line155>

          >

          > Would this code be cleaner if we didnt have to use instanceof?

          khemani wrote:

          yes, i agree it will be cleaner w/o instanceof. But in that case we will have to pass the flags all the way from HRegion to StoreScanner. A number of signatures will have to change.

          Another approach could be to only instantiate InternalScan in the server side. But again that is quite a bit of code change.

          Having an internal-scan should be useful. After all, a scan touches a lot of data and certain other tasks can potentially be piggybacked on a scan.

          ok that makes sense. let's go with it then.

          • Ryan

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/1053/#review1596
          -----------------------------------------------------------

          Show
          HBase Review Board added a comment - Message from: "Ryan Rawson" <ryanobjc@gmail.com> On 2010-10-20 21:41:23, Ryan Rawson wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java, line 155 > < http://review.cloudera.org/r/1053/diff/2/?file=15063#file15063line155 > > > Would this code be cleaner if we didnt have to use instanceof? khemani wrote: yes, i agree it will be cleaner w/o instanceof. But in that case we will have to pass the flags all the way from HRegion to StoreScanner. A number of signatures will have to change. Another approach could be to only instantiate InternalScan in the server side. But again that is quite a bit of code change. Having an internal-scan should be useful. After all, a scan touches a lot of data and certain other tasks can potentially be piggybacked on a scan. ok that makes sense. let's go with it then. Ryan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1053/#review1596 -----------------------------------------------------------
          Hide
          HBase Review Board added a comment -

          Message from: "Ryan Rawson" <ryanobjc@gmail.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/1053/#review1626
          -----------------------------------------------------------

          Ship it!

          • Ryan
          Show
          HBase Review Board added a comment - Message from: "Ryan Rawson" <ryanobjc@gmail.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1053/#review1626 ----------------------------------------------------------- Ship it! Ryan
          Hide
          Jonathan Gray added a comment -

          I will commit this in a bit, got a little bit of cleanup on the comments.

          Show
          Jonathan Gray added a comment - I will commit this in a bit, got a little bit of cleanup on the comments.
          Hide
          Jonathan Gray added a comment -

          Patch being committed. This patch is from Prakash and reviewed/approved by Ryan.

          I cleaned up some of the comments and lines >80 and also one minor change from patch up on RB. I'm also working on HBASE-2946 right now and changed the getLastIncrement() method to support multiple columns rather than just one.

          Show
          Jonathan Gray added a comment - Patch being committed. This patch is from Prakash and reviewed/approved by Ryan. I cleaned up some of the comments and lines >80 and also one minor change from patch up on RB. I'm also working on HBASE-2946 right now and changed the getLastIncrement() method to support multiple columns rather than just one.
          Hide
          Jonathan Gray added a comment -

          Extra change to pom.xml in the patch but was not committed.

          This has been committed to trunk. Thanks Prakash, great work!

          Show
          Jonathan Gray added a comment - Extra change to pom.xml in the patch but was not committed. This has been committed to trunk. Thanks Prakash, great work!
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2746 (See https://builds.apache.org/job/HBase-TRUNK/2746/)
          HBASE-3443 ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix (Revision 1325406)

          Result = SUCCESS
          larsh :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2746 (See https://builds.apache.org/job/HBase-TRUNK/2746/ ) HBASE-3443 ICV optimization to look in memstore first and then store files ( HBASE-3082 ) does not work when deletes are in the mix (Revision 1325406) Result = SUCCESS larsh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #107 (See https://builds.apache.org/job/HBase-0.94/107/)
          HBASE-3443 ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix (Revision 1325453)

          Result = SUCCESS
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #107 (See https://builds.apache.org/job/HBase-0.94/107/ ) HBASE-3443 ICV optimization to look in memstore first and then store files ( HBASE-3082 ) does not work when deletes are in the mix (Revision 1325453) Result = SUCCESS larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #169 (See https://builds.apache.org/job/HBase-TRUNK-security/169/)
          HBASE-3443 ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix (Revision 1325406)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #169 (See https://builds.apache.org/job/HBase-TRUNK-security/169/ ) HBASE-3443 ICV optimization to look in memstore first and then store files ( HBASE-3082 ) does not work when deletes are in the mix (Revision 1325406) Result = FAILURE larsh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #9 (See https://builds.apache.org/job/HBase-0.94-security/9/)
          HBASE-3443 ICV optimization to look in memstore first and then store files (HBASE-3082) does not work when deletes are in the mix (Revision 1325453)

          Result = SUCCESS
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #9 (See https://builds.apache.org/job/HBase-0.94-security/9/ ) HBASE-3443 ICV optimization to look in memstore first and then store files ( HBASE-3082 ) does not work when deletes are in the mix (Revision 1325453) Result = SUCCESS larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java

            People

            • Assignee:
              Prakash Khemani
              Reporter:
              Jonathan Gray
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development