Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-7279

Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.4, 0.95.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Did some profiling again.
      I we can gain some performance [1] when passing buffer, rowoffset, and rowlength instead of making a copy of the row key.
      That way we can also remove the row key caching (and this patch also removes the timestamps caching). Considering the sheer number in which we create KVs, every byte save is good.

      [1] (15-20% when data is in the block cache we setup a Filter such that only a single row is returned to the client).

      1. 7279-0.96-v3.txt
        11 kB
        Lars Hofhansl
      2. 7279-0.96-v2.txt
        11 kB
        Lars Hofhansl
      3. 7279-0.94-v4.txt
        11 kB
        Lars Hofhansl
      4. 7279-0.96-v1.txt
        10 kB
        Lars Hofhansl
      5. 7279-0.94-v3.txt
        9 kB
        Lars Hofhansl
      6. 7279-0.94-v2.txt
        9 kB
        Lars Hofhansl
      7. 7279-0.94.txt
        9 kB
        Lars Hofhansl

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          Integrated in HBase-0.94-security-on-Hadoop-23 #10 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/10/)
          HBASE-7279 Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher (Revision 1417716)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Result.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
          Show
          hudson Hudson added a comment - Integrated in HBase-0.94-security-on-Hadoop-23 #10 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/10/ ) HBASE-7279 Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher (Revision 1417716) Result = FAILURE larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Result.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
          Hide
          hudson Hudson added a comment -

          Integrated in HBase-0.94-security #86 (See https://builds.apache.org/job/HBase-0.94-security/86/)
          HBASE-7279 Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher (Revision 1417716)

          Result = SUCCESS
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Result.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
          Show
          hudson Hudson added a comment - Integrated in HBase-0.94-security #86 (See https://builds.apache.org/job/HBase-0.94-security/86/ ) HBASE-7279 Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher (Revision 1417716) Result = SUCCESS larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Result.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
          Hide
          hudson Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #285 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/285/)
          HBASE-7279 Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher (Revision 1417715)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
          Show
          hudson Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #285 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/285/ ) HBASE-7279 Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher (Revision 1417715) Result = FAILURE larsh : Files : /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
          Hide
          hudson Hudson added a comment -

          Integrated in HBase-0.94 #611 (See https://builds.apache.org/job/HBase-0.94/611/)
          HBASE-7279 Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher (Revision 1417716)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Result.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
          Show
          hudson Hudson added a comment - Integrated in HBase-0.94 #611 (See https://builds.apache.org/job/HBase-0.94/611/ ) HBASE-7279 Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher (Revision 1417716) Result = FAILURE larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Result.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
          Hide
          hudson Hudson added a comment -

          Integrated in HBase-TRUNK #3594 (See https://builds.apache.org/job/HBase-TRUNK/3594/)
          HBASE-7279 Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher (Revision 1417715)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
          Show
          hudson Hudson added a comment - Integrated in HBase-TRUNK #3594 (See https://builds.apache.org/job/HBase-TRUNK/3594/ ) HBASE-7279 Avoid copying the rowkey in RegionScanner, StoreScanner, and ScanQueryMatcher (Revision 1417715) Result = FAILURE larsh : Files : /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
          Hide
          lhofhansl Lars Hofhansl added a comment -

          committed to 0.94 and 0.96, thanks for the review

          Show
          lhofhansl Lars Hofhansl added a comment - committed to 0.94 and 0.96, thanks for the review
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Ran all tests in trunk with 7279-0.96-v3 and they all pass.
          Going to commit.

          Show
          lhofhansl Lars Hofhansl added a comment - Ran all tests in trunk with 7279-0.96-v3 and they all pass. Going to commit.
          Hide
          lhofhansl Lars Hofhansl added a comment -

          A reset of the rowcounter (for intra row pagination new to 0.96) got lost in the patch.
          Found that through running the tests locally.

          Show
          lhofhansl Lars Hofhansl added a comment - A reset of the rowcounter (for intra row pagination new to 0.96) got lost in the patch. Found that through running the tests locally.
          Hide
          stack stack added a comment -

          +1

          Show
          stack stack added a comment - +1
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Missed the test classes.

          Show
          lhofhansl Lars Hofhansl added a comment - Missed the test classes.
          Hide
          mcorgan Matt Corgan added a comment -

          Yeah - we can't do it with the current immutable KeyValues because of garbage and/or memory bloat, but the encoded scanners reuse a backing object where you can cache as many values as you want because you're reusing everything. See BufferedDataBlockEncoder.SeekerState (which i'm making implement the Cell interface). The SeekerState gets reused as the scanner trots along.

          Show
          mcorgan Matt Corgan added a comment - Yeah - we can't do it with the current immutable KeyValues because of garbage and/or memory bloat, but the encoded scanners reuse a backing object where you can cache as many values as you want because you're reusing everything. See BufferedDataBlockEncoder.SeekerState (which i'm making implement the Cell interface). The SeekerState gets reused as the scanner trots along.
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Matt Corgan Won't we then produce even more garbage? We can do that for KVs as well, but then we'll produce a lot of garbage. Or are you saying that works in your case, because you'll be reusing cells?

          Show
          lhofhansl Lars Hofhansl added a comment - Matt Corgan Won't we then produce even more garbage? We can do that for KVs as well, but then we'll produce a lot of garbage. Or are you saying that works in your case, because you'll be reusing cells?
          Hide
          lhofhansl Lars Hofhansl added a comment -

          And a 0.96 version

          Show
          lhofhansl Lars Hofhansl added a comment - And a 0.96 version
          Hide
          lhofhansl Lars Hofhansl added a comment -

          0.94 patch with correct heapsize (needed to all remove a REFERENCE).

          Show
          lhofhansl Lars Hofhansl added a comment - 0.94 patch with correct heapsize (needed to all remove a REFERENCE).
          Hide
          mcorgan Matt Corgan added a comment -

          on a related note - one of the benefits of the mutable Cell implementations is that the first time a cell gets parsed out of the data block, we can store all the offset/length variables in nice fast int primitives. I'm trying to convert the existing SeekerState to this right now.

          When the cells are travelling through all the scanners/heaps/filters, the methods like getQualifierLength() will simply return the already-calculated primitive int. With plain KeyValue as it is now, each time getQualifierLength() is called you have to do all of the following, and it may get called many times on the way from disk to client:

            @Override
            public short getRowLength() {
              return Bytes.toShort(this.bytes, getKeyOffset());
            }
            public int getFamilyOffset(int rlength) {
              return this.offset + ROW_OFFSET + Bytes.SIZEOF_SHORT + rlength + Bytes.SIZEOF_BYTE;
            }
            @Override
            public byte getFamilyLength() {
              return getFamilyLength(getFamilyOffset());
            }
            @Override
            public int getQualifierLength() {
              return getQualifierLength(getRowLength(),getFamilyLength());
            }
            public int getQualifierLength(int rlength, int flength) {
              return getKeyLength() - (int) getKeyDataStructureSize(rlength, flength, 0);
            }
            public static long getKeyDataStructureSize(int rlength, int flength, int qlength) {
              return KeyValue.KEY_INFRASTRUCTURE_SIZE + rlength + flength + qlength;
            }
          
          Show
          mcorgan Matt Corgan added a comment - on a related note - one of the benefits of the mutable Cell implementations is that the first time a cell gets parsed out of the data block, we can store all the offset/length variables in nice fast int primitives. I'm trying to convert the existing SeekerState to this right now. When the cells are travelling through all the scanners/heaps/filters, the methods like getQualifierLength() will simply return the already-calculated primitive int. With plain KeyValue as it is now, each time getQualifierLength() is called you have to do all of the following, and it may get called many times on the way from disk to client: @Override public short getRowLength() { return Bytes.toShort( this .bytes, getKeyOffset()); } public int getFamilyOffset( int rlength) { return this .offset + ROW_OFFSET + Bytes.SIZEOF_SHORT + rlength + Bytes.SIZEOF_BYTE; } @Override public byte getFamilyLength() { return getFamilyLength(getFamilyOffset()); } @Override public int getQualifierLength() { return getQualifierLength(getRowLength(),getFamilyLength()); } public int getQualifierLength( int rlength, int flength) { return getKeyLength() - ( int ) getKeyDataStructureSize(rlength, flength, 0); } public static long getKeyDataStructureSize( int rlength, int flength, int qlength) { return KeyValue.KEY_INFRASTRUCTURE_SIZE + rlength + flength + qlength; }
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Cool. I'll make a trunk version for HadoopQA.

          Show
          lhofhansl Lars Hofhansl added a comment - Cool. I'll make a trunk version for HadoopQA.
          Hide
          stack stack added a comment -

          Lars Hofhansl Sorry. Remove timestamp caching.

          Show
          stack stack added a comment - Lars Hofhansl Sorry. Remove timestamp caching.
          Hide
          lhofhansl Lars Hofhansl added a comment -

          While I'm at it, might as well correct the array size in the Result(List) constructor.

          Show
          lhofhansl Lars Hofhansl added a comment - While I'm at it, might as well correct the array size in the Result(List) constructor.
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Matt Corgan Thanks for point me at that. Probably not the right choice here. The local offset/length variables are just stack allocated and hence do not produce any garbage (in the GC sense).

          Show
          lhofhansl Lars Hofhansl added a comment - Matt Corgan Thanks for point me at that. Probably not the right choice here. The local offset/length variables are just stack allocated and hence do not produce any garbage (in the GC sense).
          Hide
          lhofhansl Lars Hofhansl added a comment - - edited

          Stack You mean leave out the timestamp cache, or leave out the change that removes the timestamp cache? I can go either way.

          However, 8 bytes is not insignificant (the rest of a KV is just 16 + 24 + 4 + 4 + 4 + 8 = 52). (makes me want to remove the keyLength cache as well for another 4 bytes)

          At Salesforce we're doing some scans over close to 1bn rows/kvs (most of which won't be shipped to the client).
          The issue with the timestamp cache is that it will use 8 bytes, whether we cache anything or not. Over the 1bn KVs we'll produce 8GB of garbage just for this cache.

          I would like to put this into 0.94 as well.

          Show
          lhofhansl Lars Hofhansl added a comment - - edited Stack You mean leave out the timestamp cache, or leave out the change that removes the timestamp cache? I can go either way. However, 8 bytes is not insignificant (the rest of a KV is just 16 + 24 + 4 + 4 + 4 + 8 = 52). (makes me want to remove the keyLength cache as well for another 4 bytes) At Salesforce we're doing some scans over close to 1bn rows/kvs (most of which won't be shipped to the client). The issue with the timestamp cache is that it will use 8 bytes, whether we cache anything or not. Over the 1bn KVs we'll produce 8GB of garbage just for this cache. I would like to put this into 0.94 as well.
          Hide
          mcorgan Matt Corgan added a comment -

          (we could envision an "ArrayPtr" object, which holds a reference to an array, offset, and length, but then that would another object to create)

          Lars - I submitted a ByteRange class when the Cell interface was committed which is exactly the ArrayPtr. I use it extensively in the PrefixTree module to make the code more robust and readable, but I always pool and reuse them in the hot code paths. It makes byte[]'s as easy to use as Strings for comparisons, copying, substrings, collections, sorting, deduping, etc. Sounds like not the right choice for this situation, but thought I'd point it out so you know about it.

          Show
          mcorgan Matt Corgan added a comment - (we could envision an "ArrayPtr" object, which holds a reference to an array, offset, and length, but then that would another object to create) Lars - I submitted a ByteRange class when the Cell interface was committed which is exactly the ArrayPtr. I use it extensively in the PrefixTree module to make the code more robust and readable, but I always pool and reuse them in the hot code paths. It makes byte[]'s as easy to use as Strings for comparisons, copying, substrings, collections, sorting, deduping, etc. Sounds like not the right choice for this situation, but thought I'd point it out so you know about it.
          Hide
          stack stack added a comment -

          Leave it out I'd say. We'll be profiling 0.96. Should show its head again if a prob. Can put it back then.

          Show
          stack stack added a comment - Leave it out I'd say. We'll be profiling 0.96. Should show its head again if a prob. Can put it back then.
          Hide
          lhofhansl Lars Hofhansl added a comment -

          peeked is a local stack reference, so that should be ok

          Re: how to verify the row cache is not needed... I looked at every caller of KeyValue.getRow(). These are either:

          1. tests
          2. not a hot code path (like getting the first key during a split)
          3. or from inspection it can be seen that the KV is not used twice

          I think we're good on that front.

          I am less certain about the timestamp cache, so I could put that back, and we leave that for another patch (after removing the timestamp cache I was not observing any change - neither speedup nor slowdown).

          Show
          lhofhansl Lars Hofhansl added a comment - peeked is a local stack reference, so that should be ok Re: how to verify the row cache is not needed... I looked at every caller of KeyValue.getRow(). These are either: tests not a hot code path (like getting the first key during a split) or from inspection it can be seen that the KV is not used twice I think we're good on that front. I am less certain about the timestamp cache, so I could put that back, and we leave that for another patch (after removing the timestamp cache I was not observing any change - neither speedup nor slowdown).
          Hide
          stack stack added a comment -

          I don't care about whether it pretty or not. Its fine writing a bit more code for some savings. How you think we test more if we are losing out by removing the row and ts caches?

          Patch looks good.

          Could peeked change under you while you do the below? Or is it single thread only in here?

          + byte[] row = peeked.getBuffer();
          + int offset = peeked.getRowOffset();
          + short length = peeked.getRowLength();

          Show
          stack stack added a comment - I don't care about whether it pretty or not. Its fine writing a bit more code for some savings. How you think we test more if we are losing out by removing the row and ts caches? Patch looks good. Could peeked change under you while you do the below? Or is it single thread only in here? + byte[] row = peeked.getBuffer(); + int offset = peeked.getRowOffset(); + short length = peeked.getRowLength();
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Here's a 0.94 patch (that's where I have all my testing setup).
          If folks like this patch I'll make a trunk version.

          You'll notice that things aren't quite as pretty anymore, with the byte[] + offset and length needing to be passed around.

          (we could envision an "ArrayPtr" object, which holds a reference to an array, offset, and length, but then that would another object to create)

          Show
          lhofhansl Lars Hofhansl added a comment - Here's a 0.94 patch (that's where I have all my testing setup). If folks like this patch I'll make a trunk version. You'll notice that things aren't quite as pretty anymore, with the byte[] + offset and length needing to be passed around. (we could envision an "ArrayPtr" object, which holds a reference to an array, offset, and length, but then that would another object to create)

            People

            • Assignee:
              lhofhansl Lars Hofhansl
              Reporter:
              lhofhansl Lars Hofhansl
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development