HBase
  1. HBase
  2. HBASE-4241

Optimize flushing of the Store cache for max versions and (new) min versions

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.92.0
    • Fix Version/s: 0.92.0
    • Component/s: regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      As discussed with with Jon, there is room for improvement in how the memstore is flushed to disk.
      Currently only expired KVs are pruned before flushing, but we can also prune versions if we find at least maxVersions versions in the memstore.
      The same holds for the new minversion feature: If we find at least minVersion versions in the store we can remove all further versions that are expired.

      Generally we should use the same mechanism here that is used for Compaction. I.e. StoreScanner. We only need to add a scanner to Memstore that can scan along the current snapshot.

      1. 4241.txt
        12 kB
        Lars Hofhansl
      2. 4241-v2.txt
        13 kB
        Lars Hofhansl
      3. 4241-v8.txt
        15 kB
        Lars Hofhansl

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Hide
          Lars Hofhansl added a comment -

          During memstore flush we only want to read from the snapshot. There is no principal reason why using memstore scanner for this wouldn't work (need to make sure that kvsetIt never has anything to iterate).

          The memstore scanner now also mucks with readpoints (although I think that is new). It seemed easier to just wrap a simple scanner wrapper instance around the snapshot.

          Wanna try using memstore scanner? If it turns out to be simpler I am happy to change it.

          Show
          Lars Hofhansl added a comment - During memstore flush we only want to read from the snapshot. There is no principal reason why using memstore scanner for this wouldn't work (need to make sure that kvsetIt never has anything to iterate). The memstore scanner now also mucks with readpoints (although I think that is new). It seemed easier to just wrap a simple scanner wrapper instance around the snapshot. Wanna try using memstore scanner? If it turns out to be simpler I am happy to change it.
          Hide
          Lars Hofhansl added a comment -

          Hi Liyin, just seeing this now. Let me have a look.

          Show
          Lars Hofhansl added a comment - Hi Liyin, just seeing this now. Let me have a look.
          Hide
          Liyin Tang added a comment -

          Hi Lars, Thanks for your patch. I am trying to back port this feature for hbase-89
          I have a quick question

          why we use CollectionBackedScanner but not reuse memstore scanner?

          Thanks a lot

          Show
          Liyin Tang added a comment - Hi Lars, Thanks for your patch. I am trying to back port this feature for hbase-89 I have a quick question why we use CollectionBackedScanner but not reuse memstore scanner? Thanks a lot
          Hide
          Lars Hofhansl added a comment -

          Final patch for future reference.

          Show
          Lars Hofhansl added a comment - Final patch for future reference.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2146 (See https://builds.apache.org/job/HBase-TRUNK/2146/)
          HBASE-4241 Optimize flushing of the Memstore

          stack :
          Files :

          • /hbase/trunk/CHANGES.txt
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2146 (See https://builds.apache.org/job/HBase-TRUNK/2146/ ) HBASE-4241 Optimize flushing of the Memstore stack : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-08-25 16:08:05, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 523

          > <https://reviews.apache.org/r/1650/diff/5/?file=35531#file35531line523>

          >

          > Whitespace.

          I got this on the commit

          • Michael

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1650/#review1645
          -----------------------------------------------------------

          On 2011-08-25 15:43:47, Lars Hofhansl wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/1650/

          -----------------------------------------------------------

          (Updated 2011-08-25 15:43:47)

          Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.

          Summary

          -------

          This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway.

          This covers two scenarios:

          1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk.

          2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk.

          This changes the Store flush to use the same mechanism that used for compactions.

          I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot.

          This addresses bug HBASE-4241.

          https://issues.apache.org/jira/browse/HBASE-4241

          Diffs

          -----

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347

          Diff: https://reviews.apache.org/r/1650/diff

          Testing

          -------

          Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change).

          I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect).

          No new test, as this should cause no functional change.

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-08-25 16:08:05, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java , line 523 > < https://reviews.apache.org/r/1650/diff/5/?file=35531#file35531line523 > > > Whitespace. I got this on the commit Michael ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1645 ----------------------------------------------------------- On 2011-08-25 15:43:47, Lars Hofhansl wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ ----------------------------------------------------------- (Updated 2011-08-25 15:43:47) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Summary ------- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241 . https://issues.apache.org/jira/browse/HBASE-4241 Diffs ----- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing ------- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars
          Hide
          stack added a comment -

          Committed to TRUNK. Thanks for the patch Lars. Nice reviewing Ted.

          Show
          stack added a comment - Committed to TRUNK. Thanks for the patch Lars. Nice reviewing Ted.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1650/#review1645
          -----------------------------------------------------------

          Ship it!

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          <https://reviews.apache.org/r/1650/#comment3684>

          Whitespace.

          • Ted

          On 2011-08-25 15:43:47, Lars Hofhansl wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/1650/

          -----------------------------------------------------------

          (Updated 2011-08-25 15:43:47)

          Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.

          Summary

          -------

          This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway.

          This covers two scenarios:

          1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk.

          2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk.

          This changes the Store flush to use the same mechanism that used for compactions.

          I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot.

          This addresses bug HBASE-4241.

          https://issues.apache.org/jira/browse/HBASE-4241

          Diffs

          -----

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347

          Diff: https://reviews.apache.org/r/1650/diff

          Testing

          -------

          Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change).

          I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect).

          No new test, as this should cause no functional change.

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1645 ----------------------------------------------------------- Ship it! http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java < https://reviews.apache.org/r/1650/#comment3684 > Whitespace. Ted On 2011-08-25 15:43:47, Lars Hofhansl wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ ----------------------------------------------------------- (Updated 2011-08-25 15:43:47) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Summary ------- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241 . https://issues.apache.org/jira/browse/HBASE-4241 Diffs ----- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing ------- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1650/
          -----------------------------------------------------------

          (Updated 2011-08-25 15:43:47.733133)

          Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.

          Changes
          -------

          Make sure the StoreScanner is always closed (in analogy what Store.compactStore(...) does).

          Summary
          -------

          This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway.
          This covers two scenarios:
          1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk.
          2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk.

          This changes the Store flush to use the same mechanism that used for compactions.
          I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot.

          This addresses bug HBASE-4241.
          https://issues.apache.org/jira/browse/HBASE-4241

          Diffs (updated)


          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347

          Diff: https://reviews.apache.org/r/1650/diff

          Testing
          -------

          Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change).
          I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect).

          No new test, as this should cause no functional change.

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ ----------------------------------------------------------- (Updated 2011-08-25 15:43:47.733133) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Changes ------- Make sure the StoreScanner is always closed (in analogy what Store.compactStore(...) does). Summary ------- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241 . https://issues.apache.org/jira/browse/HBASE-4241 Diffs (updated) http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing ------- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1650/#review1636
          -----------------------------------------------------------

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          <https://reviews.apache.org/r/1650/#comment3677>

          The more we put into finally block, the higher the chance that the last close() may be skipped by exception from previous calls to writer (lines 518 and 520).
          Ideally calls to writer should be enclosed in try/catch blocks.

          • Ted

          On 2011-08-25 05:38:13, Lars Hofhansl wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/1650/

          -----------------------------------------------------------

          (Updated 2011-08-25 05:38:13)

          Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.

          Summary

          -------

          This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway.

          This covers two scenarios:

          1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk.

          2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk.

          This changes the Store flush to use the same mechanism that used for compactions.

          I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot.

          This addresses bug HBASE-4241.

          https://issues.apache.org/jira/browse/HBASE-4241

          Diffs

          -----

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347

          Diff: https://reviews.apache.org/r/1650/diff

          Testing

          -------

          Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change).

          I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect).

          No new test, as this should cause no functional change.

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1636 ----------------------------------------------------------- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java < https://reviews.apache.org/r/1650/#comment3677 > The more we put into finally block, the higher the chance that the last close() may be skipped by exception from previous calls to writer (lines 518 and 520). Ideally calls to writer should be enclosed in try/catch blocks. Ted On 2011-08-25 05:38:13, Lars Hofhansl wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ ----------------------------------------------------------- (Updated 2011-08-25 05:38:13) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Summary ------- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241 . https://issues.apache.org/jira/browse/HBASE-4241 Diffs ----- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing ------- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1650/
          -----------------------------------------------------------

          (Updated 2011-08-25 05:38:13.482663)

          Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.

          Summary
          -------

          This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway.
          This covers two scenarios:
          1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk.
          2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk.

          This changes the Store flush to use the same mechanism that used for compactions.
          I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot.

          This addresses bug HBASE-4241.
          https://issues.apache.org/jira/browse/HBASE-4241

          Diffs (updated)


          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347

          Diff: https://reviews.apache.org/r/1650/diff

          Testing
          -------

          Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change).
          I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect).

          No new test, as this should cause no functional change.

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ ----------------------------------------------------------- (Updated 2011-08-25 05:38:13.482663) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Summary ------- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241 . https://issues.apache.org/jira/browse/HBASE-4241 Diffs (updated) http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing ------- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-08-25 05:16:29, Michael Stack wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java, line 33

          > <https://reviews.apache.org/r/1650/diff/3/?file=35483#file35483line33>

          >

          > Nice

          I stole most of this from the test code

          On 2011-08-25 05:16:29, Michael Stack wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 505

          > <https://reviews.apache.org/r/1650/diff/3/?file=35482#file35482line505>

          >

          > This is nice, the way you are going to the scanner to next rather than what we did before where we'd iterate a set.

          It's also nice because now it uses the exact same logic that is used during compaction.

          On 2011-08-25 05:16:29, Michael Stack wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 488

          > <https://reviews.apache.org/r/1650/diff/3/?file=35482#file35482line488>

          >

          > oh. so we never cared about this before? we'd just keep doing versions even if well in excess of max? We were pruning versions though? Higher in the stack? In the scanner that called this one?

          Versions were only ever pruned as part of compactions. (A StoreScanner is also in Store.compactStore(...))

          We can't get rid of all excess versions at flush time (as not all of them are in the memstore), but we can make a good effort to avoid flushing some of the versions.

          On 2011-08-25 05:16:29, Michael Stack wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 516

          > <https://reviews.apache.org/r/1650/diff/3/?file=35482#file35482line516>

          >

          > Should this be in a finally block? Would it make a diff if this was left hanging open because of some exception? We can get one up out of the scanner.next, right?

          >

          > oh, there is a finally just after... maybe move the scanner.close in there?

          Hmm... I was thinking since KeyValueScanner.close() can throw IOException I don't want it to be in the finally clause.
          I also think this close this close() is not actually needed.
          Looking at StoreScanner.close(), though, I see that it does not throw anything.

          Not sure about this one, I'll move it into the finally block, after the close of the writer, just to be save.

          I'll upload a new version soon and also includes suggestions from Ted.

          • Lars

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1650/#review1628
          -----------------------------------------------------------

          On 2011-08-25 04:49:36, Lars Hofhansl wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/1650/

          -----------------------------------------------------------

          (Updated 2011-08-25 04:49:36)

          Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.

          Summary

          -------

          This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway.

          This covers two scenarios:

          1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk.

          2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk.

          This changes the Store flush to use the same mechanism that used for compactions.

          I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot.

          This addresses bug HBASE-4241.

          https://issues.apache.org/jira/browse/HBASE-4241

          Diffs

          -----

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347

          Diff: https://reviews.apache.org/r/1650/diff

          Testing

          -------

          Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change).

          I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect).

          No new test, as this should cause no functional change.

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-08-25 05:16:29, Michael Stack wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java , line 33 > < https://reviews.apache.org/r/1650/diff/3/?file=35483#file35483line33 > > > Nice I stole most of this from the test code On 2011-08-25 05:16:29, Michael Stack wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java , line 505 > < https://reviews.apache.org/r/1650/diff/3/?file=35482#file35482line505 > > > This is nice, the way you are going to the scanner to next rather than what we did before where we'd iterate a set. It's also nice because now it uses the exact same logic that is used during compaction. On 2011-08-25 05:16:29, Michael Stack wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java , line 488 > < https://reviews.apache.org/r/1650/diff/3/?file=35482#file35482line488 > > > oh. so we never cared about this before? we'd just keep doing versions even if well in excess of max? We were pruning versions though? Higher in the stack? In the scanner that called this one? Versions were only ever pruned as part of compactions. (A StoreScanner is also in Store.compactStore(...)) We can't get rid of all excess versions at flush time (as not all of them are in the memstore), but we can make a good effort to avoid flushing some of the versions. On 2011-08-25 05:16:29, Michael Stack wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java , line 516 > < https://reviews.apache.org/r/1650/diff/3/?file=35482#file35482line516 > > > Should this be in a finally block? Would it make a diff if this was left hanging open because of some exception? We can get one up out of the scanner.next, right? > > oh, there is a finally just after... maybe move the scanner.close in there? Hmm... I was thinking since KeyValueScanner.close() can throw IOException I don't want it to be in the finally clause. I also think this close this close() is not actually needed. Looking at StoreScanner.close(), though, I see that it does not throw anything. Not sure about this one, I'll move it into the finally block, after the close of the writer, just to be save. I'll upload a new version soon and also includes suggestions from Ted. Lars ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1628 ----------------------------------------------------------- On 2011-08-25 04:49:36, Lars Hofhansl wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ ----------------------------------------------------------- (Updated 2011-08-25 04:49:36) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Summary ------- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241 . https://issues.apache.org/jira/browse/HBASE-4241 Diffs ----- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing ------- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1650/#review1628
          -----------------------------------------------------------

          Ship it!

          Minor comments below. Will wait on your feedback before commit. There is one thing to fix I think – the finally thing. Let me know and I can do it on commit.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          <https://reviews.apache.org/r/1650/#comment3661>

          oh. so we never cared about this before? we'd just keep doing versions even if well in excess of max? We were pruning versions though? Higher in the stack? In the scanner that called this one?

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          <https://reviews.apache.org/r/1650/#comment3663>

          This is nice, the way you are going to the scanner to next rather than what we did before where we'd iterate a set.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          <https://reviews.apache.org/r/1650/#comment3664>

          Should this be in a finally block? Would it make a diff if this was left hanging open because of some exception? We can get one up out of the scanner.next, right?

          oh, there is a finally just after... maybe move the scanner.close in there?

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
          <https://reviews.apache.org/r/1650/#comment3665>

          Nice

          • Michael

          On 2011-08-25 04:49:36, Lars Hofhansl wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/1650/

          -----------------------------------------------------------

          (Updated 2011-08-25 04:49:36)

          Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.

          Summary

          -------

          This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway.

          This covers two scenarios:

          1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk.

          2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk.

          This changes the Store flush to use the same mechanism that used for compactions.

          I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot.

          This addresses bug HBASE-4241.

          https://issues.apache.org/jira/browse/HBASE-4241

          Diffs

          -----

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347

          Diff: https://reviews.apache.org/r/1650/diff

          Testing

          -------

          Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change).

          I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect).

          No new test, as this should cause no functional change.

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1628 ----------------------------------------------------------- Ship it! Minor comments below. Will wait on your feedback before commit. There is one thing to fix I think – the finally thing. Let me know and I can do it on commit. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java < https://reviews.apache.org/r/1650/#comment3661 > oh. so we never cared about this before? we'd just keep doing versions even if well in excess of max? We were pruning versions though? Higher in the stack? In the scanner that called this one? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java < https://reviews.apache.org/r/1650/#comment3663 > This is nice, the way you are going to the scanner to next rather than what we did before where we'd iterate a set. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java < https://reviews.apache.org/r/1650/#comment3664 > Should this be in a finally block? Would it make a diff if this was left hanging open because of some exception? We can get one up out of the scanner.next, right? oh, there is a finally just after... maybe move the scanner.close in there? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java < https://reviews.apache.org/r/1650/#comment3665 > Nice Michael On 2011-08-25 04:49:36, Lars Hofhansl wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ ----------------------------------------------------------- (Updated 2011-08-25 04:49:36) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Summary ------- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241 . https://issues.apache.org/jira/browse/HBASE-4241 Diffs ----- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing ------- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-08-25 05:12:20, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java, line 64

          > <https://reviews.apache.org/r/1650/diff/3/?file=35483#file35483line64>

          >

          > I think vararg form (KeyValue... incData) is more flexible than array form ([]).

          > vararg has to be the last parameter.

          Oh I see what you mean. Agreed, will upload a new version in minute.

          • Lars

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1650/#review1629
          -----------------------------------------------------------

          On 2011-08-25 04:49:36, Lars Hofhansl wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/1650/

          -----------------------------------------------------------

          (Updated 2011-08-25 04:49:36)

          Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.

          Summary

          -------

          This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway.

          This covers two scenarios:

          1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk.

          2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk.

          This changes the Store flush to use the same mechanism that used for compactions.

          I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot.

          This addresses bug HBASE-4241.

          https://issues.apache.org/jira/browse/HBASE-4241

          Diffs

          -----

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347

          Diff: https://reviews.apache.org/r/1650/diff

          Testing

          -------

          Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change).

          I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect).

          No new test, as this should cause no functional change.

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-08-25 05:12:20, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java , line 64 > < https://reviews.apache.org/r/1650/diff/3/?file=35483#file35483line64 > > > I think vararg form (KeyValue... incData) is more flexible than array form ([]). > vararg has to be the last parameter. Oh I see what you mean. Agreed, will upload a new version in minute. Lars ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1629 ----------------------------------------------------------- On 2011-08-25 04:49:36, Lars Hofhansl wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ ----------------------------------------------------------- (Updated 2011-08-25 04:49:36) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Summary ------- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241 . https://issues.apache.org/jira/browse/HBASE-4241 Diffs ----- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing ------- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1650/#review1629
          -----------------------------------------------------------

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
          <https://reviews.apache.org/r/1650/#comment3662>

          I think vararg form (KeyValue... incData) is more flexible than array form ([]).
          vararg has to be the last parameter.

          • Ted

          On 2011-08-25 04:49:36, Lars Hofhansl wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/1650/

          -----------------------------------------------------------

          (Updated 2011-08-25 04:49:36)

          Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.

          Summary

          -------

          This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway.

          This covers two scenarios:

          1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk.

          2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk.

          This changes the Store flush to use the same mechanism that used for compactions.

          I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot.

          This addresses bug HBASE-4241.

          https://issues.apache.org/jira/browse/HBASE-4241

          Diffs

          -----

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347

          Diff: https://reviews.apache.org/r/1650/diff

          Testing

          -------

          Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change).

          I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect).

          No new test, as this should cause no functional change.

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1629 ----------------------------------------------------------- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java < https://reviews.apache.org/r/1650/#comment3662 > I think vararg form (KeyValue... incData) is more flexible than array form ([]). vararg has to be the last parameter. Ted On 2011-08-25 04:49:36, Lars Hofhansl wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ ----------------------------------------------------------- (Updated 2011-08-25 04:49:36) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Summary ------- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241 . https://issues.apache.org/jira/browse/HBASE-4241 Diffs ----- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing ------- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1650/
          -----------------------------------------------------------

          (Updated 2011-08-25 04:49:36.340305)

          Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.

          Changes
          -------

          Added linebreak.
          Added comment that deletes need to be retained.

          Summary
          -------

          This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway.
          This covers two scenarios:
          1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk.
          2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk.

          This changes the Store flush to use the same mechanism that used for compactions.
          I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot.

          This addresses bug HBASE-4241.
          https://issues.apache.org/jira/browse/HBASE-4241

          Diffs (updated)


          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347

          Diff: https://reviews.apache.org/r/1650/diff

          Testing
          -------

          Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change).
          I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect).

          No new test, as this should cause no functional change.

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ ----------------------------------------------------------- (Updated 2011-08-25 04:49:36.340305) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Changes ------- Added linebreak. Added comment that deletes need to be retained. Summary ------- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241 . https://issues.apache.org/jira/browse/HBASE-4241 Diffs (updated) http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing ------- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-08-25 04:29:19, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java, line 64

          > <https://reviews.apache.org/r/1650/diff/2/?file=35430#file35430line64>

          >

          > Minor comment:

          > This ctor is used by KeyValueScanFixture below. Putting comparator as first parameter would make it more readable.

          Lars Hofhansl wrote:

          Agreed, I'll change the order.

          Actually all the other constructors take the comparator as the 2nd argument.
          I think I either change them all or leave them all in this order.

          • Lars

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1650/#review1625
          -----------------------------------------------------------

          On 2011-08-25 01:28:34, Lars Hofhansl wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/1650/

          -----------------------------------------------------------

          (Updated 2011-08-25 01:28:34)

          Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.

          Summary

          -------

          This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway.

          This covers two scenarios:

          1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk.

          2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk.

          This changes the Store flush to use the same mechanism that used for compactions.

          I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot.

          This addresses bug HBASE-4241.

          https://issues.apache.org/jira/browse/HBASE-4241

          Diffs

          -----

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347

          Diff: https://reviews.apache.org/r/1650/diff

          Testing

          -------

          Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change).

          I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect).

          No new test, as this should cause no functional change.

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-08-25 04:29:19, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java , line 64 > < https://reviews.apache.org/r/1650/diff/2/?file=35430#file35430line64 > > > Minor comment: > This ctor is used by KeyValueScanFixture below. Putting comparator as first parameter would make it more readable. Lars Hofhansl wrote: Agreed, I'll change the order. Actually all the other constructors take the comparator as the 2nd argument. I think I either change them all or leave them all in this order. Lars ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1625 ----------------------------------------------------------- On 2011-08-25 01:28:34, Lars Hofhansl wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ ----------------------------------------------------------- (Updated 2011-08-25 01:28:34) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Summary ------- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241 . https://issues.apache.org/jira/browse/HBASE-4241 Diffs ----- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing ------- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-08-25 04:29:19, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 490

          > <https://reviews.apache.org/r/1650/diff/2/?file=35429#file35429line490>

          >

          > Line wrap is needed for this long line.

          Will do.

          On 2011-08-25 04:29:19, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 491

          > <https://reviews.apache.org/r/1650/diff/2/?file=35429#file35429line491>

          >

          > This should be false.

          This definitely needs to be true. Deletes need to be retained when the cache is flushed.
          True, means to retain deletes.

          (I had this the wrong way in my first attempt as well and then wondered why all the deleted rows were not removed)

          On 2011-08-25 04:29:19, Ted Yu wrote:

          > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java, line 64

          > <https://reviews.apache.org/r/1650/diff/2/?file=35430#file35430line64>

          >

          > Minor comment:

          > This ctor is used by KeyValueScanFixture below. Putting comparator as first parameter would make it more readable.

          Agreed, I'll change the order.

          • Lars

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1650/#review1625
          -----------------------------------------------------------

          On 2011-08-25 01:28:34, Lars Hofhansl wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/1650/

          -----------------------------------------------------------

          (Updated 2011-08-25 01:28:34)

          Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.

          Summary

          -------

          This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway.

          This covers two scenarios:

          1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk.

          2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk.

          This changes the Store flush to use the same mechanism that used for compactions.

          I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot.

          This addresses bug HBASE-4241.

          https://issues.apache.org/jira/browse/HBASE-4241

          Diffs

          -----

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347

          Diff: https://reviews.apache.org/r/1650/diff

          Testing

          -------

          Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change).

          I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect).

          No new test, as this should cause no functional change.

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-08-25 04:29:19, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java , line 490 > < https://reviews.apache.org/r/1650/diff/2/?file=35429#file35429line490 > > > Line wrap is needed for this long line. Will do. On 2011-08-25 04:29:19, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java , line 491 > < https://reviews.apache.org/r/1650/diff/2/?file=35429#file35429line491 > > > This should be false. This definitely needs to be true. Deletes need to be retained when the cache is flushed. True, means to retain deletes. (I had this the wrong way in my first attempt as well and then wondered why all the deleted rows were not removed) On 2011-08-25 04:29:19, Ted Yu wrote: > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java , line 64 > < https://reviews.apache.org/r/1650/diff/2/?file=35430#file35430line64 > > > Minor comment: > This ctor is used by KeyValueScanFixture below. Putting comparator as first parameter would make it more readable. Agreed, I'll change the order. Lars ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1625 ----------------------------------------------------------- On 2011-08-25 01:28:34, Lars Hofhansl wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ ----------------------------------------------------------- (Updated 2011-08-25 01:28:34) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Summary ------- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241 . https://issues.apache.org/jira/browse/HBASE-4241 Diffs ----- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing ------- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1650/#review1625
          -----------------------------------------------------------

          Nice work.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          <https://reviews.apache.org/r/1650/#comment3654>

          Line wrap is needed for this long line.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          <https://reviews.apache.org/r/1650/#comment3655>

          This should be false.

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
          <https://reviews.apache.org/r/1650/#comment3656>

          Minor comment:
          This ctor is used by KeyValueScanFixture below. Putting comparator as first parameter would make it more readable.

          • Ted

          On 2011-08-25 01:28:34, Lars Hofhansl wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/1650/

          -----------------------------------------------------------

          (Updated 2011-08-25 01:28:34)

          Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.

          Summary

          -------

          This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway.

          This covers two scenarios:

          1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk.

          2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk.

          This changes the Store flush to use the same mechanism that used for compactions.

          I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot.

          This addresses bug HBASE-4241.

          https://issues.apache.org/jira/browse/HBASE-4241

          Diffs

          -----

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347

          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347

          Diff: https://reviews.apache.org/r/1650/diff

          Testing

          -------

          Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change).

          I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect).

          No new test, as this should cause no functional change.

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/#review1625 ----------------------------------------------------------- Nice work. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java < https://reviews.apache.org/r/1650/#comment3654 > Line wrap is needed for this long line. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java < https://reviews.apache.org/r/1650/#comment3655 > This should be false. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java < https://reviews.apache.org/r/1650/#comment3656 > Minor comment: This ctor is used by KeyValueScanFixture below. Putting comparator as first parameter would make it more readable. Ted On 2011-08-25 01:28:34, Lars Hofhansl wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ ----------------------------------------------------------- (Updated 2011-08-25 01:28:34) Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Summary ------- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241 . https://issues.apache.org/jira/browse/HBASE-4241 Diffs ----- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing ------- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars
          Hide
          Lars Hofhansl added a comment -

          I have an (unrealistic) local test scenario that basically "updates" the same row in a state table. VERSIONS is set to 1.

          In this scenario 1m "updates" produced > 50mb of garbage that is completely avoided with this change.

          Show
          Lars Hofhansl added a comment - I have an (unrealistic) local test scenario that basically "updates" the same row in a state table. VERSIONS is set to 1. In this scenario 1m "updates" produced > 50mb of garbage that is completely avoided with this change.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1650/
          -----------------------------------------------------------

          Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.

          Summary
          -------

          This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway.
          This covers two scenarios:
          1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk.
          2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk.

          This changes the Store flush to use the same mechanism that used for compactions.
          I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot.

          This addresses bug HBASE-4241.
          https://issues.apache.org/jira/browse/HBASE-4241

          Diffs


          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347
          http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347

          Diff: https://reviews.apache.org/r/1650/diff

          Testing
          -------

          Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change).
          I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect).

          No new test, as this should cause no functional change.

          Thanks,

          Lars

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1650/ ----------------------------------------------------------- Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Summary ------- This avoids flushing row versions to disk that are known to be GC'd by the next compaction anyway. This covers two scenarios: 1. maxVersions=N and we find at least N versions in the memstore. We can safely avoid flushing any further versions to disk. 2. similarly minVersions=N and we find at least N versions in the memstore. Now we can safely avoid flushing any further expired versions to disk. This changes the Store flush to use the same mechanism that used for compactions. I borrowed some code from the tests and refactored the test code to use a new utility class that wraps a sorted collection and then behaves like KeyValueScanner. The same class is used to create scanner over the memstore's snapshot. This addresses bug HBASE-4241 . https://issues.apache.org/jira/browse/HBASE-4241 Diffs http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueScanFixture.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java 1161347 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1161347 Diff: https://reviews.apache.org/r/1650/diff Testing ------- Ran all tests. TestHTablePool and TestDistributedLogSplitting error out (with or without my change). I had to change three tests that incorrectly relied on old rows hanging around after a flush (or were otherwise incorrect). No new test, as this should cause no functional change. Thanks, Lars
          Hide
          Lars Hofhansl added a comment -

          Good point that would make it more clear. I also discovered another problem that broken many of the tests. Fixed now.
          Once I get a complete test run through without failures I'll file a review-request.

          Show
          Lars Hofhansl added a comment - Good point that would make it more clear. I also discovered another problem that broken many of the tests. Fixed now. Once I get a complete test run through without failures I'll file a review-request.
          Hide
          Ted Yu added a comment -

          Since HBASE-4071 was New Feature, this JIRA should be something similar.

          Looking at CollectionBackedScanner again, all ctor's (implicitly) take a comparator. So the description of (anything iterable) is inaccurate.
          How about the following:

          + * Utility scanner that wraps a sortable collection and serves
          
          Show
          Ted Yu added a comment - Since HBASE-4071 was New Feature, this JIRA should be something similar. Looking at CollectionBackedScanner again, all ctor's (implicitly) take a comparator. So the description of (anything iterable) is inaccurate. How about the following: + * Utility scanner that wraps a sortable collection and serves
          Hide
          Ted Yu added a comment -

          Year should be 2011.

          + * Copyright 2009 The Apache Software Foundation
          

          I think a better description:

          + * Utility scanner that can wrap a collection (anything iterable)and pretend to
          + * be a KeyValueScanner.
          

          would be:

          + * Utility scanner that wraps a collection (anything iterable) and serves
          + * as a KeyValueScanner.
          

          Please run through unit tests.
          If they pass, please use reviewboard for further comments.

          Good work.

          Show
          Ted Yu added a comment - Year should be 2011. + * Copyright 2009 The Apache Software Foundation I think a better description: + * Utility scanner that can wrap a collection (anything iterable)and pretend to + * be a KeyValueScanner. would be: + * Utility scanner that wraps a collection (anything iterable) and serves + * as a KeyValueScanner. Please run through unit tests. If they pass, please use reviewboard for further comments. Good work.
          Hide
          Lars Hofhansl added a comment -

          Better patch. Renames ScannerWrapper to CollectionBackedScanner, also uses correct comparator from Store.
          (And also changes FIXED_OVERHEAD in Store.java once more as there is a new maxVersions member).

          Show
          Lars Hofhansl added a comment - Better patch. Renames ScannerWrapper to CollectionBackedScanner, also uses correct comparator from Store. (And also changes FIXED_OVERHEAD in Store.java once more as there is a new maxVersions member).
          Hide
          Lars Hofhansl added a comment -

          I like that. I had been struggling with a name for it.

          Show
          Lars Hofhansl added a comment - I like that. I had been struggling with a name for it.
          Hide
          Ted Yu added a comment -

          Interesting idea.

          +/**
          + * Utility scanner that can wrap a collection and pretend to
          + * be a KeyValueScanner.
          + */
          +public class ScannerWrapper implements KeyValueScanner {
          

          This class might be used elsewhere.
          How about naming it CollectionBackedScanner ?

          Show
          Ted Yu added a comment - Interesting idea. +/** + * Utility scanner that can wrap a collection and pretend to + * be a KeyValueScanner. + */ + public class ScannerWrapper implements KeyValueScanner { This class might be used elsewhere. How about naming it CollectionBackedScanner ?
          Hide
          Lars Hofhansl added a comment -

          Changed to "major" as I think this is an important optimization.

          Show
          Lars Hofhansl added a comment - Changed to "major" as I think this is an important optimization.
          Hide
          Lars Hofhansl added a comment -

          Might as well keep using this one...

          Please let me know what you think of the patch in general, then I'll clean it up, add my local unitests, and post for formal review.

          Show
          Lars Hofhansl added a comment - Might as well keep using this one... Please let me know what you think of the patch in general, then I'll clean it up, add my local unitests, and post for formal review.
          Hide
          Lars Hofhansl added a comment -

          Here's a sketch for a patch.
          Basically the KVs of the memstore's snapshot are wrapped in a scanner and then passed for a new StoreScanner.
          So we use the same logic as in compactions here, which is good.
          The reasoning why this is correct goes as follows:
          If maxVersion=N and we find more than N versions in the memstore we safely avoid writing all further versions.
          Same for minVersions, if minversions=N and we find at least N versions in the memstore we can avoid writing all further expired versions.

          Show
          Lars Hofhansl added a comment - Here's a sketch for a patch. Basically the KVs of the memstore's snapshot are wrapped in a scanner and then passed for a new StoreScanner. So we use the same logic as in compactions here, which is good. The reasoning why this is correct goes as follows: If maxVersion=N and we find more than N versions in the memstore we safely avoid writing all further versions. Same for minVersions, if minversions=N and we find at least N versions in the memstore we can avoid writing all further expired versions.
          Hide
          Lars Hofhansl added a comment -

          Turns out this is actually most useful to avoid flushing versions past the maxversion to disk, and only somewhat related to HBASE-4071.
          It's not that likely that TTL is so short that rows in the memstore expire before they are flushed to disk. however, there are scenarios (such as state or counter) tables where maxversions=1 and there bery frequent updates.

          I have a test patch and testing with such a scnerio and found the disk saving are significant. I think this warrant a top-level jira. I'll close one.

          Show
          Lars Hofhansl added a comment - Turns out this is actually most useful to avoid flushing versions past the maxversion to disk, and only somewhat related to HBASE-4071 . It's not that likely that TTL is so short that rows in the memstore expire before they are flushed to disk. however, there are scenarios (such as state or counter) tables where maxversions=1 and there bery frequent updates. I have a test patch and testing with such a scnerio and found the disk saving are significant. I think this warrant a top-level jira. I'll close one.

            People

            • Assignee:
              Lars Hofhansl
              Reporter:
              Lars Hofhansl
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development