HBase
  1. HBase
  2. HBASE-4465

Lazy-seek optimization for StoreFile scanners

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Check the most recent file first before seeking all other files in a Store.

      Description

      Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement).

      This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue.

        Issue Links

          Activity

          Hide
          Jonathan Gray added a comment -

          love the elegant solution! +1 on this! bring back the old Get!

          Show
          Jonathan Gray added a comment - love the elegant solution! +1 on this! bring back the old Get!
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2180/
          -----------------------------------------------------------

          Review request for hbase.

          Summary
          -------

          Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement).

          This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue.

          This addresses bug HBASE-4465.
          https://issues.apache.org/jira/browse/HBASE-4465

          Diffs


          src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006
          src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7
          src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280
          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8
          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276
          src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765
          src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e
          src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab
          src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7

          Diff: https://reviews.apache.org/r/2180/diff

          Testing
          -------

          Running unit tests – please do not commit yet.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/ ----------------------------------------------------------- Review request for hbase. Summary ------- Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement). This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue. This addresses bug HBASE-4465 . https://issues.apache.org/jira/browse/HBASE-4465 Diffs src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7 src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276 src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765 src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7 Diff: https://reviews.apache.org/r/2180/diff Testing ------- Running unit tests – please do not commit yet. Thanks, Mikhail
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2180/#review2332
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
          <https://reviews.apache.org/r/2180/#comment5392>

          Should be "lazily-sought"

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
          <https://reviews.apache.org/r/2180/#comment5393>

          Should be 'real-sought' and 'lazily-sought'

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java
          <https://reviews.apache.org/r/2180/#comment5394>

          Should be 'is sought'

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
          <https://reviews.apache.org/r/2180/#comment5395>

          Should realSeekDone be set before returning ?

          • Ted

          On 2011-10-04 22:10:40, Mikhail Bautin wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2180/

          -----------------------------------------------------------

          (Updated 2011-10-04 22:10:40)

          Review request for hbase.

          Summary

          -------

          Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement).

          This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue.

          This addresses bug HBASE-4465.

          https://issues.apache.org/jira/browse/HBASE-4465

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006

          src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7

          src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276

          src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765

          src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e

          src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab

          src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7

          Diff: https://reviews.apache.org/r/2180/diff

          Testing

          -------

          Running unit tests – please do not commit yet.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/#review2332 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java < https://reviews.apache.org/r/2180/#comment5392 > Should be "lazily-sought" src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java < https://reviews.apache.org/r/2180/#comment5393 > Should be 'real-sought' and 'lazily-sought' src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java < https://reviews.apache.org/r/2180/#comment5394 > Should be 'is sought' src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java < https://reviews.apache.org/r/2180/#comment5395 > Should realSeekDone be set before returning ? Ted On 2011-10-04 22:10:40, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/ ----------------------------------------------------------- (Updated 2011-10-04 22:10:40) Review request for hbase. Summary ------- Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement). This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue. This addresses bug HBASE-4465 . https://issues.apache.org/jira/browse/HBASE-4465 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7 src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276 src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765 src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7 Diff: https://reviews.apache.org/r/2180/diff Testing ------- Running unit tests – please do not commit yet. Thanks, Mikhail
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-10-04 23:56:22, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java, line 306

          > <https://reviews.apache.org/r/2180/diff/1/?file=47924#file47924line306>

          >

          > Should be "lazily-sought"

          Somehow "sought" does not sound right for me – "seek" is a very specific computer science term here. Replaced with "has done a seek operation" here and below.

          On 2011-10-04 23:56:22, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java, line 246

          > <https://reviews.apache.org/r/2180/diff/1/?file=47930#file47930line246>

          >

          > Should realSeekDone be set before returning ?

          realSeekDone is set to true by enforceSeek() in case we decide we need to do it as part of requestSeek()

          realSeekDone = false; // <-- setting this by default in case lazy seek
          // takes effect or enforceSeek() fails
          . . .
          if (seekTimestamp > maxTimestampInFile)

          { // Create a fake key that is not greater than the real next key. // (Lower timestamps correspond to higher KVs.) // To understand this better, consider that we are asked to seek to // a higher timestamp than the max timestamp in this file. We know that // the next point when we have to consider this file again is when we // pass the max timestamp of this file (with the same row/column). cur = kv.createFirstOnRowColTS(maxTimestampInFile); }

          else

          { // This will be the case e.g. when we need to seek to the next // row/column, and we don't know exactly what they are, so we set the // seek key's timestamp to OLDEST_TIMESTAMP to skip the rest of this // row/column. enforceSeek(); // <-- this sets realSeekDone }

          On 2011-10-04 23:56:22, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java, line 371

          > <https://reviews.apache.org/r/2180/diff/1/?file=47924#file47924line371>

          >

          > Should be 'real-sought' and 'lazily-sought'

          Done.

          On 2011-10-04 23:56:22, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java, line 101

          > <https://reviews.apache.org/r/2180/diff/1/?file=47925#file47925line101>

          >

          > Should be 'is sought'

          Done.

          • Mikhail

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2180/#review2332
          -----------------------------------------------------------

          On 2011-10-04 22:10:40, Mikhail Bautin wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2180/

          -----------------------------------------------------------

          (Updated 2011-10-04 22:10:40)

          Review request for hbase.

          Summary

          -------

          Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement).

          This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue.

          This addresses bug HBASE-4465.

          https://issues.apache.org/jira/browse/HBASE-4465

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006

          src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7

          src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276

          src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765

          src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e

          src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab

          src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7

          Diff: https://reviews.apache.org/r/2180/diff

          Testing

          -------

          Running unit tests – please do not commit yet.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-10-04 23:56:22, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java, line 306 > < https://reviews.apache.org/r/2180/diff/1/?file=47924#file47924line306 > > > Should be "lazily-sought" Somehow "sought" does not sound right for me – "seek" is a very specific computer science term here. Replaced with "has done a seek operation" here and below. On 2011-10-04 23:56:22, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java, line 246 > < https://reviews.apache.org/r/2180/diff/1/?file=47930#file47930line246 > > > Should realSeekDone be set before returning ? realSeekDone is set to true by enforceSeek() in case we decide we need to do it as part of requestSeek() realSeekDone = false; // <-- setting this by default in case lazy seek // takes effect or enforceSeek() fails . . . if (seekTimestamp > maxTimestampInFile) { // Create a fake key that is not greater than the real next key. // (Lower timestamps correspond to higher KVs.) // To understand this better, consider that we are asked to seek to // a higher timestamp than the max timestamp in this file. We know that // the next point when we have to consider this file again is when we // pass the max timestamp of this file (with the same row/column). cur = kv.createFirstOnRowColTS(maxTimestampInFile); } else { // This will be the case e.g. when we need to seek to the next // row/column, and we don't know exactly what they are, so we set the // seek key's timestamp to OLDEST_TIMESTAMP to skip the rest of this // row/column. enforceSeek(); // <-- this sets realSeekDone } On 2011-10-04 23:56:22, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java, line 371 > < https://reviews.apache.org/r/2180/diff/1/?file=47924#file47924line371 > > > Should be 'real-sought' and 'lazily-sought' Done. On 2011-10-04 23:56:22, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java, line 101 > < https://reviews.apache.org/r/2180/diff/1/?file=47925#file47925line101 > > > Should be 'is sought' Done. Mikhail ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/#review2332 ----------------------------------------------------------- On 2011-10-04 22:10:40, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/ ----------------------------------------------------------- (Updated 2011-10-04 22:10:40) Review request for hbase. Summary ------- Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement). This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue. This addresses bug HBASE-4465 . https://issues.apache.org/jira/browse/HBASE-4465 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7 src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276 src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765 src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7 Diff: https://reviews.apache.org/r/2180/diff Testing ------- Running unit tests – please do not commit yet. Thanks, Mikhail
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2180/
          -----------------------------------------------------------

          (Updated 2011-10-05 17:56:40.110499)

          Review request for hbase.

          Changes
          -------

          Replying to Ted's comments. Will post a new version of the diff shortly.

          Summary
          -------

          Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement).

          This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue.

          This addresses bug HBASE-4465.
          https://issues.apache.org/jira/browse/HBASE-4465

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006
          src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7
          src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280
          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8
          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276
          src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765
          src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e
          src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb
          src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7

          Diff: https://reviews.apache.org/r/2180/diff

          Testing
          -------

          Running unit tests – please do not commit yet.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/ ----------------------------------------------------------- (Updated 2011-10-05 17:56:40.110499) Review request for hbase. Changes ------- Replying to Ted's comments. Will post a new version of the diff shortly. Summary ------- Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement). This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue. This addresses bug HBASE-4465 . https://issues.apache.org/jira/browse/HBASE-4465 Diffs (updated) src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7 src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276 src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765 src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7 Diff: https://reviews.apache.org/r/2180/diff Testing ------- Running unit tests – please do not commit yet. Thanks, Mikhail
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2180/
          -----------------------------------------------------------

          (Updated 2011-10-05 17:59:30.185594)

          Review request for hbase.

          Changes
          -------

          Addressing Ted's comments and fixing TestBlocksRead because the number of blocks read has decreased in a few cases.

          Summary
          -------

          Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement).

          This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue.

          This addresses bug HBASE-4465.
          https://issues.apache.org/jira/browse/HBASE-4465

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006
          src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7
          src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280
          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8
          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276
          src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765
          src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e
          src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb
          src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7

          Diff: https://reviews.apache.org/r/2180/diff

          Testing
          -------

          Running unit tests – please do not commit yet.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/ ----------------------------------------------------------- (Updated 2011-10-05 17:59:30.185594) Review request for hbase. Changes ------- Addressing Ted's comments and fixing TestBlocksRead because the number of blocks read has decreased in a few cases. Summary ------- Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement). This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue. This addresses bug HBASE-4465 . https://issues.apache.org/jira/browse/HBASE-4465 Diffs (updated) src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7 src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276 src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765 src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7 Diff: https://reviews.apache.org/r/2180/diff Testing ------- Running unit tests – please do not commit yet. Thanks, Mikhail
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2180/
          -----------------------------------------------------------

          (Updated 2011-10-05 18:00:03.234737)

          Review request for hbase.

          Changes
          -------

          Updating testing done.

          Summary
          -------

          Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement).

          This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue.

          This addresses bug HBASE-4465.
          https://issues.apache.org/jira/browse/HBASE-4465

          Diffs


          src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006
          src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7
          src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280
          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8
          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276
          src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765
          src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e
          src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb
          src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7

          Diff: https://reviews.apache.org/r/2180/diff

          Testing (updated)
          -------

          All unit tests should be passing now. Will rebase and re-run again just in case.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/ ----------------------------------------------------------- (Updated 2011-10-05 18:00:03.234737) Review request for hbase. Changes ------- Updating testing done. Summary ------- Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement). This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue. This addresses bug HBASE-4465 . https://issues.apache.org/jira/browse/HBASE-4465 Diffs src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7 src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276 src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765 src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7 Diff: https://reviews.apache.org/r/2180/diff Testing (updated) ------- All unit tests should be passing now. Will rebase and re-run again just in case. Thanks, Mikhail
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2180/#review2354
          -----------------------------------------------------------

          TestBlocksRead changes look good.

          Nice work Mikhail/Liyin-- this is a huge optimization!

          • Kannan

          On 2011-10-05 18:00:03, Mikhail Bautin wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2180/

          -----------------------------------------------------------

          (Updated 2011-10-05 18:00:03)

          Review request for hbase.

          Summary

          -------

          Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement).

          This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue.

          This addresses bug HBASE-4465.

          https://issues.apache.org/jira/browse/HBASE-4465

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006

          src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7

          src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276

          src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765

          src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e

          src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb

          src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7

          Diff: https://reviews.apache.org/r/2180/diff

          Testing

          -------

          All unit tests should be passing now. Will rebase and re-run again just in case.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/#review2354 ----------------------------------------------------------- TestBlocksRead changes look good. Nice work Mikhail/Liyin-- this is a huge optimization! Kannan On 2011-10-05 18:00:03, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/ ----------------------------------------------------------- (Updated 2011-10-05 18:00:03) Review request for hbase. Summary ------- Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement). This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue. This addresses bug HBASE-4465 . https://issues.apache.org/jira/browse/HBASE-4465 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7 src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276 src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765 src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7 Diff: https://reviews.apache.org/r/2180/diff Testing ------- All unit tests should be passing now. Will rebase and re-run again just in case. Thanks, Mikhail
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2180/#review2358
          -----------------------------------------------------------

          That is a pretty cool idea.
          I was looking at this w.r.t. to HBASE-4071 and HBASE-4241, and it looks good from that viewpoint.

          • Lars

          On 2011-10-05 18:00:03, Mikhail Bautin wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2180/

          -----------------------------------------------------------

          (Updated 2011-10-05 18:00:03)

          Review request for hbase.

          Summary

          -------

          Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement).

          This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue.

          This addresses bug HBASE-4465.

          https://issues.apache.org/jira/browse/HBASE-4465

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006

          src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7

          src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276

          src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765

          src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e

          src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb

          src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7

          Diff: https://reviews.apache.org/r/2180/diff

          Testing

          -------

          All unit tests should be passing now. Will rebase and re-run again just in case.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/#review2358 ----------------------------------------------------------- That is a pretty cool idea. I was looking at this w.r.t. to HBASE-4071 and HBASE-4241 , and it looks good from that viewpoint. Lars On 2011-10-05 18:00:03, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/ ----------------------------------------------------------- (Updated 2011-10-05 18:00:03) Review request for hbase. Summary ------- Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement). This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue. This addresses bug HBASE-4465 . https://issues.apache.org/jira/browse/HBASE-4465 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7 src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276 src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765 src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7 Diff: https://reviews.apache.org/r/2180/diff Testing ------- All unit tests should be passing now. Will rebase and re-run again just in case. Thanks, Mikhail
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2180/#review2360
          -----------------------------------------------------------

          Ship it!

          This is some really awesome work, Mikhail and Liyin. You guys have taken our codez to a new level with all the changes you guys are making. And the fake KV idea is super elegant. Nice job! This is going to be a major performance win across so many applications.

          Bring back the old get!

          • Jonathan

          On 2011-10-05 18:00:03, Mikhail Bautin wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2180/

          -----------------------------------------------------------

          (Updated 2011-10-05 18:00:03)

          Review request for hbase.

          Summary

          -------

          Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement).

          This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue.

          This addresses bug HBASE-4465.

          https://issues.apache.org/jira/browse/HBASE-4465

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006

          src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7

          src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276

          src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765

          src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e

          src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb

          src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7

          Diff: https://reviews.apache.org/r/2180/diff

          Testing

          -------

          All unit tests should be passing now. Will rebase and re-run again just in case.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/#review2360 ----------------------------------------------------------- Ship it! This is some really awesome work, Mikhail and Liyin. You guys have taken our codez to a new level with all the changes you guys are making. And the fake KV idea is super elegant. Nice job! This is going to be a major performance win across so many applications. Bring back the old get! Jonathan On 2011-10-05 18:00:03, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/ ----------------------------------------------------------- (Updated 2011-10-05 18:00:03) Review request for hbase. Summary ------- Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement). This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue. This addresses bug HBASE-4465 . https://issues.apache.org/jira/browse/HBASE-4465 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7 src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276 src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765 src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7 Diff: https://reviews.apache.org/r/2180/diff Testing ------- All unit tests should be passing now. Will rebase and re-run again just in case. Thanks, Mikhail
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2180/#review2361
          -----------------------------------------------------------

          Thanks Mikhail. I have reviewed this diff internally and it looks good to me.

          • Liyin

          On 2011-10-05 18:00:03, Mikhail Bautin wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2180/

          -----------------------------------------------------------

          (Updated 2011-10-05 18:00:03)

          Review request for hbase.

          Summary

          -------

          Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement).

          This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue.

          This addresses bug HBASE-4465.

          https://issues.apache.org/jira/browse/HBASE-4465

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006

          src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7

          src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276

          src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765

          src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e

          src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb

          src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7

          Diff: https://reviews.apache.org/r/2180/diff

          Testing

          -------

          All unit tests should be passing now. Will rebase and re-run again just in case.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/#review2361 ----------------------------------------------------------- Thanks Mikhail. I have reviewed this diff internally and it looks good to me. Liyin On 2011-10-05 18:00:03, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/ ----------------------------------------------------------- (Updated 2011-10-05 18:00:03) Review request for hbase. Summary ------- Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement). This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue. This addresses bug HBASE-4465 . https://issues.apache.org/jira/browse/HBASE-4465 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7 src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276 src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765 src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7 Diff: https://reviews.apache.org/r/2180/diff Testing ------- All unit tests should be passing now. Will rebase and re-run again just in case. Thanks, Mikhail
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2180/
          -----------------------------------------------------------

          (Updated 2011-10-05 19:11:26.781722)

          Review request for hbase.

          Changes
          -------

          Getting rid of duplicate import of AtomicLong in StoreFileScanner.

          Summary
          -------

          Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement).

          This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue.

          This addresses bug HBASE-4465.
          https://issues.apache.org/jira/browse/HBASE-4465

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006
          src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7
          src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280
          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8
          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276
          src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765
          src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e
          src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb
          src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7

          Diff: https://reviews.apache.org/r/2180/diff

          Testing
          -------

          All unit tests should be passing now. Will rebase and re-run again just in case.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/ ----------------------------------------------------------- (Updated 2011-10-05 19:11:26.781722) Review request for hbase. Changes ------- Getting rid of duplicate import of AtomicLong in StoreFileScanner. Summary ------- Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement). This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue. This addresses bug HBASE-4465 . https://issues.apache.org/jira/browse/HBASE-4465 Diffs (updated) src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7 src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276 src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765 src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7 Diff: https://reviews.apache.org/r/2180/diff Testing ------- All unit tests should be passing now. Will rebase and re-run again just in case. Thanks, Mikhail
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2180/#review2363
          -----------------------------------------------------------

          Ship it!

          • Ted

          On 2011-10-05 19:11:26, Mikhail Bautin wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2180/

          -----------------------------------------------------------

          (Updated 2011-10-05 19:11:26)

          Review request for hbase.

          Summary

          -------

          Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement).

          This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue.

          This addresses bug HBASE-4465.

          https://issues.apache.org/jira/browse/HBASE-4465

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006

          src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7

          src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8

          src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276

          src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765

          src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e

          src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb

          src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7

          Diff: https://reviews.apache.org/r/2180/diff

          Testing

          -------

          All unit tests should be passing now. Will rebase and re-run again just in case.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/#review2363 ----------------------------------------------------------- Ship it! Ted On 2011-10-05 19:11:26, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2180/ ----------------------------------------------------------- (Updated 2011-10-05 19:11:26) Review request for hbase. Summary ------- Previously, if we had several StoreFiles for a column family in a region, we would seek in each of them and only then merge the results, even though the row/column we are looking for might only be in the most recent (and the smallest) file. Now we prioritize our reads from those files so that we check the most recent file first. This is done by doing a "lazy seek" which pretends that the next value in the StoreFile is (seekRow, seekColumn, lastTimestampInStoreFile), which is earlier in the KV order than anything that might actually occur in the file. So if we don't find the result in earlier files, that fake KV will bubble up to the top of the KV heap and a real seek will be done. This is expected to significantly reduce the amount of disk IO (as of 09/22/2011 we are doing dark launch testing and measurement). This is joint work with Liyin Tang – huge thanks to him for many helpful discussions on this and the idea of putting fake KVs with the highest timestamp of the StoreFile in the scanner priority queue. This addresses bug HBASE-4465 . https://issues.apache.org/jira/browse/HBASE-4465 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java aa34006 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 94ddce7 src/main/java/org/apache/hadoop/hbase/regionserver/ColumnCount.java 1be0280 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java b8d33e8 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java fbcd276 src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 035f765 src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java dad278a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java abb5931 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 31bfea7 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 64a6e3e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 8ad5aab src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b3beabb src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 9d2b2a7 Diff: https://reviews.apache.org/r/2180/diff Testing ------- All unit tests should be passing now. Will rebase and re-run again just in case. Thanks, Mikhail
          Hide
          Mikhail Bautin added a comment -

          All unit tests passed, ready to be committed.

          Show
          Mikhail Bautin added a comment - All unit tests passed, ready to be committed.
          Hide
          Jonathan Gray added a comment -

          Please attach the final patch to JIRA.

          Show
          Jonathan Gray added a comment - Please attach the final patch to JIRA.
          Hide
          Mikhail Bautin added a comment -

          Attaching the latest version of the patch.

          Show
          Mikhail Bautin added a comment - Attaching the latest version of the patch.
          Hide
          Jonathan Gray added a comment -

          Committed to trunk. What's the status on the 89 branch? Should we keep this open?

          Show
          Jonathan Gray added a comment - Committed to trunk. What's the status on the 89 branch? Should we keep this open?
          Hide
          Mikhail Bautin added a comment -

          Thanks, Jonathan! I just chatted with Nicolas and he said we should not worry about the 89 branch yet because he will be syncing our internal changes into the public 89 branch.

          Show
          Mikhail Bautin added a comment - Thanks, Jonathan! I just chatted with Nicolas and he said we should not worry about the 89 branch yet because he will be syncing our internal changes into the public 89 branch.
          Hide
          Jonathan Gray added a comment -

          Nice work Liyin and Mikhail!

          Show
          Jonathan Gray added a comment - Nice work Liyin and Mikhail!
          Hide
          stack added a comment -

          Is this feature on by default? It seems to be. I'm not sure.

          Show
          stack added a comment - Is this feature on by default? It seems to be. I'm not sure.
          Hide
          Mikhail Bautin added a comment -

          @Stack: yes, this feature is on by default, because it has the same or better performance as before in all cases.

          Show
          Mikhail Bautin added a comment - @Stack: yes, this feature is on by default, because it has the same or better performance as before in all cases.
          Hide
          stack added a comment -

          Thanks Mikhail.

          Show
          stack added a comment - Thanks Mikhail.

            People

            • Assignee:
              Mikhail Bautin
              Reporter:
              Mikhail Bautin
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development