HBase
  1. HBase
  2. HBASE-2037

Alternate indexed hbase implementation; speeds scans by adding indexes to regions rather secondary tables

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.3
    • Component/s: Coprocessors
    • Labels:
      None

      Description

      Purpose

      The goal of the indexed HBase contrib is to speed up scans by indexing HBase columns. Indexed HBase (IHbase) is different from the indexed tables in transactional HBase (ITHbase): while the indexes in ITHBase are, in fact, hbase tables using the indexed column's values as row keys, IHbase creates indexes at the region level. The differences are summarized in below.

      + global ordering
      ITHBase: yes
      IHBase: no
      Comment: IHBase has an index for each region. The flip side of not having global ordering is compatibility with the good old HRegion: results are coming back in row order (and not value order as in THBase)

      + Full table scan?
      ITHBase: no
      IHBase: no
      Comment: ITHbase does a partial scan on the index table. IHbase supports specifying start/end rows to limit the number of scanned regions

      + Multiple Index Usage
      ITHBase: no
      IHBase: yes
      Comment: IHBase can take advantage of multiple indexes in the same scan. IHBase IdxScan object accepts an Expression which allows intersection/ unison of several indexed
      column criteria

      + Extra disk storage
      ITHBase: yes
      IHBase: no
      Comment: IHbase indexes are created when the region starts/flushes and do not require any extra storage

      + Extra RAM
      ITHBase: yes
      IHBase: yes
      Comment: IHbase indexes are in memory and hence increase the memory overhead. THbase indexes increase the number of regions each region server has to support thus costing memory too

      + Parallel scanning support
      ITHBase: no
      IHBase: yes
      In ITHbase the index table needs to be consulted and then GETs are issued for each matching row. The behavior of IHBase (as perceived by the client) is no different than a regular scan and hence supports parallel scanning seamlessly. parallel GET can be implemented to speedup ITHbase scans

      Why IHbase should outperform ITHBase
      1. More flexible: a. Supports range queries and multi-index queries b. Supports different types - not only byte arrays
      2. Less overhead: ITHbase pays at least two 'table roundtrips' - one for the index table and the other for the main table
      3. Quicker index expression evaluation: IHBase is using dedicated index data structures while ITHbase is using the regular HRegion scan facilities

      Implementation notes
      • Only index Storefiles.Every index scan performs a full memstore scan. Indexing the memstore will be implemented only if scanning the memstore will prove to be a performance bottleneck
      • Index expression evaluation is performed using bit sets.There are two types of bitsets: compressed and expanded. An index will typically store a compressed bitset while an expression evaluator will most probably use an expanded bitset
      + TODO

      This patch changes some some of hbase core so can instantiate other than default HRegion. Fixes bugs in filter too.

      Would like to add this as a contrib. package on 0.20 branch in time for 0.20.3 if possible.

      1. index.html
        21 kB
        stack
      2. idx-hbase3.patch
        2.53 MB
        stack
      3. idx-hbase2.patch
        2.46 MB
        stack

        Issue Links

          Activity

          Hide
          stack added a comment -

          This is a git patch. Includes apache commons-lang 2.4 jar. Used by the modified Pair class and by the some of the indexed classes

          Other changes:

          • src/contrib/build-contrib.xml enabled assertions (-ea) on contrib tests
          • src/java/org/apache/hadoop/hbase/ HConstants.java Added a constant (REGION_IMPL) for the region implementation configuration
          • src/java/org/apache/hadoop/hbase/HMerge.java Replcaed region construction with factory method HRegion.newRegion
          • src/java /org/apache/hadoop/hbase/client/ Scan.java Added a 'values' map to the scan object. Same purpose as HColumnDescriptor values.
          • src/java/org/apache/hadoop/hbase/filter/ FilterList.java Fixed a bug in the filterKeyValue method which prevented encapsulated filters from seeing the key-value when operator is MUST_PASS_ONE and a filter other than the last filter on the filter list asked to include the record.
          • src/java/org/apache/hadoop/hbase/regionserver/ HRegion.java
            1. Added a static method newHRegion which takes the region class denoted by HConstants.REGION_IMPL into account when instantiating the region
            2. Changed all the region instantiations to use the above method
            3. Commented the HRegion constructor so that it won't be called directly from production code
            4. Elevated the protection of internalFlushCache to protected
            5. ChangedtheinternalFlushCachemethod implementation to use the StoreFlusher and block scans when a store flush is committed
            6. Added a hook internalPreFlashcacheCommit which is used by the IdxRegion to rebuild the index
            7. Deletes and puts are now blocking the creation of a new scanner (and vise versa). The need for this is demonstrated by a test method added to TestHRegion. Note that only scanner creation is blocked.
            8. Added method RegionScanner.fillNextResults to allow the IdxRegionScanner to fast-forward the storeHeap after the results are read from the storeHeap.
            9. Added a protected method instantiateInternalScanner and changed the instantiation in the getScanner method to use it
            Document generated by Confluence on 09 Dec 2009, 16:41 Page 1
          • src/test/org/apache/hadoop/hbase/regionserver/ TestHRegion.java
            1. Elevated the protection of the final table, qual, value and row final variables to protected
            2. Broke the private initHRegion method to two protected methods. This change allows minimal code to inherit from this test and run all its tests on {{IdxRegion}}s
            3. Added a bunch of tests most of them multi threaded to demonstrate/verify operation under concurrent writes/flushes/gets/scans. Most of the tests added fail if you don't apply the rest of the patch
          • src/java/org/apache/hadoop/hbase/regionserver/ HRegionServer.java Changed HRegion instantiation to use the static HRegion.newHRegion method
          • src/java/org/apache/hadoop/hbase/regionserver/ KeyValueSkipListSet.java Changed to be cloneable to support the new MemstoreScanner implementation
          • src/java/org/apache/hadoop/hbase/regionserver/ MemStore.java
            1. Cancelled the observer pattern for scanners which causes on-going writes to slow down scans
            2. Adjusted FIXED_OVERHEAD since the observer list was removed
            3. Ripped out MemStoreScanner and put it in its own file
            4. Some whitespace changes which I'm too lazy to revert
          • src/java/org/apache/hadoop/hbase/regionserver/ MemStoreScanner.java MemStoreScanner completely re-implemented. It had a bug which prevented it from scanning correctly while a snapshot existed and it was reset every time the store was written. The price paid for making it independent of writes is cloning the kvset and snapshot when it's created. My calculation shows that this costs approx 4K for every clone. I have a note to add a test to verify that. The need to make changes to memstore scans arose from failing functional tests (added to TestHRegion). We didn't do any performance optimizations - only checked that it didn't make things worse.
          • src/java/org/apache/hadoop/hbase/regionserver/ ScanDeleteTracker.java Changed < to <=. A test was added to TestHRegion.
          • src/java/org/apache/hadoop/hbase/regionserver/ Store.java Store flushing broken out to a StoreFlusher (below). Need arose from functional test failures (TestHRegion) and from the need to add a hook for index recreation
          • src/test/org/apache/hadoop/hbase/regionserver/ TestStore.java Fixed to use the new implementation of
            flushCache
          • src/java/org/apache/hadoop/hbase/regionserver/ StoreFlusher.java An interface used by HRegion to flush stores. the prepare() and commit() methods are hooks for quick switches:
            • prepare() switches the MemStore's kvset to snapshot
          Show
          stack added a comment - This is a git patch. Includes apache commons-lang 2.4 jar. Used by the modified Pair class and by the some of the indexed classes Other changes: src/contrib/build-contrib.xml enabled assertions (-ea) on contrib tests src/java/org/apache/hadoop/hbase/ HConstants.java Added a constant (REGION_IMPL) for the region implementation configuration src/java/org/apache/hadoop/hbase/HMerge.java Replcaed region construction with factory method HRegion.newRegion src/java /org/apache/hadoop/hbase/client/ Scan.java Added a 'values' map to the scan object. Same purpose as HColumnDescriptor values. src/java/org/apache/hadoop/hbase/filter/ FilterList.java Fixed a bug in the filterKeyValue method which prevented encapsulated filters from seeing the key-value when operator is MUST_PASS_ONE and a filter other than the last filter on the filter list asked to include the record. src/java/org/apache/hadoop/hbase/regionserver/ HRegion.java 1. Added a static method newHRegion which takes the region class denoted by HConstants.REGION_IMPL into account when instantiating the region 2. Changed all the region instantiations to use the above method 3. Commented the HRegion constructor so that it won't be called directly from production code 4. Elevated the protection of internalFlushCache to protected 5. ChangedtheinternalFlushCachemethod implementation to use the StoreFlusher and block scans when a store flush is committed 6. Added a hook internalPreFlashcacheCommit which is used by the IdxRegion to rebuild the index 7. Deletes and puts are now blocking the creation of a new scanner (and vise versa). The need for this is demonstrated by a test method added to TestHRegion. Note that only scanner creation is blocked. 8. Added method RegionScanner.fillNextResults to allow the IdxRegionScanner to fast-forward the storeHeap after the results are read from the storeHeap. 9. Added a protected method instantiateInternalScanner and changed the instantiation in the getScanner method to use it Document generated by Confluence on 09 Dec 2009, 16:41 Page 1 src/test/org/apache/hadoop/hbase/regionserver/ TestHRegion.java 1. Elevated the protection of the final table, qual, value and row final variables to protected 2. Broke the private initHRegion method to two protected methods. This change allows minimal code to inherit from this test and run all its tests on {{IdxRegion}}s 3. Added a bunch of tests most of them multi threaded to demonstrate/verify operation under concurrent writes/flushes/gets/scans. Most of the tests added fail if you don't apply the rest of the patch src/java/org/apache/hadoop/hbase/regionserver/ HRegionServer.java Changed HRegion instantiation to use the static HRegion.newHRegion method src/java/org/apache/hadoop/hbase/regionserver/ KeyValueSkipListSet.java Changed to be cloneable to support the new MemstoreScanner implementation src/java/org/apache/hadoop/hbase/regionserver/ MemStore.java 1. Cancelled the observer pattern for scanners which causes on-going writes to slow down scans 2. Adjusted FIXED_OVERHEAD since the observer list was removed 3. Ripped out MemStoreScanner and put it in its own file 4. Some whitespace changes which I'm too lazy to revert src/java/org/apache/hadoop/hbase/regionserver/ MemStoreScanner.java MemStoreScanner completely re-implemented. It had a bug which prevented it from scanning correctly while a snapshot existed and it was reset every time the store was written. The price paid for making it independent of writes is cloning the kvset and snapshot when it's created. My calculation shows that this costs approx 4K for every clone. I have a note to add a test to verify that. The need to make changes to memstore scans arose from failing functional tests (added to TestHRegion). We didn't do any performance optimizations - only checked that it didn't make things worse. src/java/org/apache/hadoop/hbase/regionserver/ ScanDeleteTracker.java Changed < to <=. A test was added to TestHRegion. src/java/org/apache/hadoop/hbase/regionserver/ Store.java Store flushing broken out to a StoreFlusher (below). Need arose from functional test failures (TestHRegion) and from the need to add a hook for index recreation src/test/org/apache/hadoop/hbase/regionserver/ TestStore.java Fixed to use the new implementation of flushCache src/java/org/apache/hadoop/hbase/regionserver/ StoreFlusher.java An interface used by HRegion to flush stores. the prepare() and commit() methods are hooks for quick switches: • prepare() switches the MemStore's kvset to snapshot
          Hide
          Andrew Purtell added a comment -

          Maybe this can go in to 0.20.3 as-is but for 0.21 (or 0.22) we can do something else? Excepting the changes to core, this seems like something which should be implemented as a coprocessor. It would also serve as a goalpost for the coprocessor stuff.

          Show
          Andrew Purtell added a comment - Maybe this can go in to 0.20.3 as-is but for 0.21 (or 0.22) we can do something else? Excepting the changes to core, this seems like something which should be implemented as a coprocessor. It would also serve as a goalpost for the coprocessor stuff.
          Hide
          stack added a comment -

          Yes, it seems like a good candidate to be done as coprocessor. Let me see about including in 0.20.3. Includes a few significant changes to core.

          Show
          stack added a comment - Yes, it seems like a good candidate to be done as coprocessor. Let me see about including in 0.20.3. Includes a few significant changes to core.
          Hide
          Andrew Purtell added a comment -

          Some of those changes, like MemStore refactoring and StoreFlusher, have a lot of merit in their own right.

          Show
          Andrew Purtell added a comment - Some of those changes, like MemStore refactoring and StoreFlusher, have a lot of merit in their own right.
          Hide
          stack added a comment -

          All tests pass but this one:

          [junit] Test org.apache.hadoop.hbase.regionserver.TestGetDeleteTracker FAILED is failing. Otherwise all other core tests are passing.

          Looking into it.

          This new patch includes following:

          • Scan is now backward compatible...uses a negative number for the version instead of a Boolean overload. This change doesn't support a 0.20.3 scan being sent to a 0.20.2 region
          • Copyright fixes
          • More tests for heap usage
          • One more IdxRegion test
          • JMX Bean for the IdxRegion
          Show
          stack added a comment - All tests pass but this one: [junit] Test org.apache.hadoop.hbase.regionserver.TestGetDeleteTracker FAILED is failing. Otherwise all other core tests are passing. Looking into it. This new patch includes following: Scan is now backward compatible...uses a negative number for the version instead of a Boolean overload. This change doesn't support a 0.20.3 scan being sent to a 0.20.2 region Copyright fixes More tests for heap usage One more IdxRegion test JMX Bean for the IdxRegion
          Hide
          stack added a comment -

          All contrib tests including all of these new ones are also passing and the above core test failure seems like a transient failure:

          Testsuite: org.apache.hadoop.hbase.regionserver.TestGetDeleteTracker
          Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.056 sec
          ------------- Standard Output ---------------
          Qf col, timestamp, 1262675527734260000, type Delete
          Qf col, timestamp, 1262675527734259000, type DeleteColumn
          Qf col2, timestamp, 1262675527734259000, type Delete
          ------------- ---------------- ---------------
          
          Testcase: testUpdate_CompareDeletes took 0.003 sec
          Testcase: testUpdate took 0.002 sec
          Testcase: testIsDeleted_NotDeleted took 0 sec
          Testcase: testIsDeleted_Delete took 0 sec
          Testcase: testIsDeleted_DeleteColumn took 0 sec
          Testcase: testIsDeleted_DeleteFamily took 0 sec
          Testcase: testStackOverflow took 0.037 sec
          
          Show
          stack added a comment - All contrib tests including all of these new ones are also passing and the above core test failure seems like a transient failure: Testsuite: org.apache.hadoop.hbase.regionserver.TestGetDeleteTracker Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.056 sec ------------- Standard Output --------------- Qf col, timestamp, 1262675527734260000, type Delete Qf col, timestamp, 1262675527734259000, type DeleteColumn Qf col2, timestamp, 1262675527734259000, type Delete ------------- ---------------- --------------- Testcase: testUpdate_CompareDeletes took 0.003 sec Testcase: testUpdate took 0.002 sec Testcase: testIsDeleted_NotDeleted took 0 sec Testcase: testIsDeleted_Delete took 0 sec Testcase: testIsDeleted_DeleteColumn took 0 sec Testcase: testIsDeleted_DeleteFamily took 0 sec Testcase: testStackOverflow took 0.037 sec
          Hide
          stack added a comment -

          i want to run a few tests with this patch in place up on cluster to be sure its refactoring of such as the changing registry of store files under load hasn't broke anything.

          Show
          stack added a comment - i want to run a few tests with this patch in place up on cluster to be sure its refactoring of such as the changing registry of store files under load hasn't broke anything.
          Hide
          stack added a comment -

          Here is an html file that needs to be converted into overview doc. for this new contribution.

          Show
          stack added a comment - Here is an html file that needs to be converted into overview doc. for this new contribution.
          Hide
          stack added a comment -

          I committed this. Will be testing the 0.20.3 RC anyways. Can test this patch at that time. I made hbase-2092 to make a version of this patch for TRUNK.

          Show
          stack added a comment - I committed this. Will be testing the 0.20.3 RC anyways. Can test this patch at that time. I made hbase-2092 to make a version of this patch for TRUNK.
          Hide
          Bassam Tabbara added a comment -

          Not sure if this was intended or not, but commons-lang 2.4 jar must now be on the HADOOP_CLASSPATH in order to run HBase mapred jobs (like RowCounter).

          Show
          Bassam Tabbara added a comment - Not sure if this was intended or not, but commons-lang 2.4 jar must now be on the HADOOP_CLASSPATH in order to run HBase mapred jobs (like RowCounter).
          Hide
          Jean-Daniel Cryans added a comment -

          So any client using getStartRows (creates a Pair object which uses HashCodeBuilder) will now need the new jar... this breaks compatibility.

          Show
          Jean-Daniel Cryans added a comment - So any client using getStartRows (creates a Pair object which uses HashCodeBuilder) will now need the new jar... this breaks compatibility.
          Hide
          stack added a comment -

          I'm looking into it.... If I can't fix this, I think we should back out hbase-2037.

          Show
          stack added a comment - I'm looking into it.... If I can't fix this, I think we should back out hbase-2037.
          Hide
          stack added a comment - - edited

          I made hbase-2094 as blocker on 0.20.3.

          Show
          stack added a comment - - edited I made hbase-2094 as blocker on 0.20.3.
          Hide
          stack added a comment -

          Committed to branch. Opened another issue to apply to TRUNK.

          Show
          stack added a comment - Committed to branch. Opened another issue to apply to TRUNK.

            People

            • Assignee:
              stack
              Reporter:
              stack
            • Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development