Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.0
    • Component/s: Client, regionserver
    • Labels:
      None

      Description

      I'm working on a secondary index impl. The basic idea is to maintain a separate table per index.

      1. secondary.patch
        3 kB
        stack
      2. hbase-883.patch
        60 kB
        Andrew Purtell
      3. hbase-883.patch
        61 kB
        Clint Morgan
      4. hbase-883.patch
        59 kB
        Clint Morgan
      5. hbase-883.patch
        62 kB
        Clint Morgan
      6. hbase-883.patch
        62 kB
        Clint Morgan
      7. hbase-883.patch
        58 kB
        Clint Morgan
      8. hbase-883.patch
        41 kB
        Clint Morgan

        Issue Links

          Activity

          Hide
          Eugene Koontz added a comment -

          related to the comment that stack added on 25/Sep/08 21:56

          "talk on the list has made mention of secondary indices done using lucene rather than keeping a second table."

          Show
          Eugene Koontz added a comment - related to the comment that stack added on 25/Sep/08 21:56 "talk on the list has made mention of secondary indices done using lucene rather than keeping a second table."
          Hide
          stack added a comment -

          Thanks for checking in Clint. Thanks for fixing Andrew.

          Show
          stack added a comment - Thanks for checking in Clint. Thanks for fixing Andrew.
          Hide
          Andrew Purtell added a comment -

          Committed fixes as described in previous comment to trunk. Passes all local tests.

          Show
          Andrew Purtell added a comment - Committed fixes as described in previous comment to trunk. Passes all local tests.
          Hide
          Clint Morgan added a comment -

          Thanks for pushing this through guys, I've been working on some other things. Sorry for lack of response...

          WRT the custom comparator "tax". Was this with the latest version of the patch? You pay the tax even when no customer comparator is set? Because I noticed a similar issue in older version of patch and fixed it (in 17/Oct/08 patch).

          But anyway, that impl of custom comparators does not work due to META issues that stack mentioned above...

          Show
          Clint Morgan added a comment - Thanks for pushing this through guys, I've been working on some other things. Sorry for lack of response... WRT the custom comparator "tax". Was this with the latest version of the patch? You pay the tax even when no customer comparator is set? Because I noticed a similar issue in older version of patch and fixed it (in 17/Oct/08 patch). But anyway, that impl of custom comparators does not work due to META issues that stack mentioned above...
          Hide
          Andrew Purtell added a comment - - edited

          Third time will be the charm. We took out the custom comparators because we can see them acting as a tax on the rest of HRS: 5-10% CPU, 20% of all object allocation (on a jgray server). But the custom comparators are used to reverse a test to provide descending sort orders. For now we'll degrade the secondary index capability somewhat by fixing it up to work without the custom comparators, hence no descending sort ordering.

          Show
          Andrew Purtell added a comment - - edited Third time will be the charm. We took out the custom comparators because we can see them acting as a tax on the rest of HRS: 5-10% CPU, 20% of all object allocation (on a jgray server). But the custom comparators are used to reverse a test to provide descending sort orders. For now we'll degrade the secondary index capability somewhat by fixing it up to work without the custom comparators, hence no descending sort ordering.
          Hide
          Andrew Purtell added a comment -

          +1

          Show
          Andrew Purtell added a comment - +1
          Hide
          stack added a comment -

          Backed out custom comparators; resolving issue again.

          Show
          stack added a comment - Backed out custom comparators; resolving issue again.
          Hide
          stack added a comment -

          Reopening until I back out comparators.

          Show
          stack added a comment - Reopening until I back out comparators.
          Hide
          stack added a comment -

          Andrew, I want to backout the stuff around comparators. I made a comment above but the committed patch still had them in there. They don't work, is my understanding, and as is, they are responsible for about 18% of the allocations on jgrays HRS.

          Show
          stack added a comment - Andrew, I want to backout the stuff around comparators. I made a comment above but the committed patch still had them in there. They don't work, is my understanding, and as is, they are responsible for about 18% of the allocations on jgrays HRS.
          Hide
          Andrew Purtell added a comment -

          Committed to TRUNK. Passed all local tests.

          Show
          Andrew Purtell added a comment - Committed to TRUNK. Passed all local tests.
          Hide
          stack added a comment -

          Want to commit it to TRUNK then Andrew?
          St.Ack

          Show
          stack added a comment - Want to commit it to TRUNK then Andrew? St.Ack
          Hide
          Andrew Purtell added a comment -

          I can walk this through.

          Show
          Andrew Purtell added a comment - I can walk this through.
          Hide
          Andrew Purtell added a comment -

          Attached updated patch that addresses all of my comments. I've been running this privately for some time without problems.

          Show
          Andrew Purtell added a comment - Attached updated patch that addresses all of my comments. I've been running this privately for some time without problems.
          Hide
          stack added a comment -

          Moving out of 0.19 unless Clint shows up soon addressing Andrew's feedback.

          Show
          stack added a comment - Moving out of 0.19 unless Clint shows up soon addressing Andrew's feedback.
          Hide
          Andrew Purtell added a comment -

          Another suggestion is to remove the @inheritDoc tag at o.a.h.h.client.tableindexed.IndexedTable.java:56 to fix a javadoc warning.

          Show
          Andrew Purtell added a comment - Another suggestion is to remove the @inheritDoc tag at o.a.h.h.client.tableindexed.IndexedTable.java:56 to fix a javadoc warning.
          Hide
          Andrew Purtell added a comment - - edited

          o.a.h.h.client.tableindexed.IndexScanner.ScannerWrapper needs next(int).

          A suggestion:

          
              /** {@inheritDoc} */
              public RowResult next() throws IOException {
                  RowResult[] result = next(1);
                  if (result == null || result.length < 1)
                    return null;
                  return result[0];
              }
          
              /** {@inheritDoc} */
              public RowResult[] next(int nbRows) throws IOException {
                RowResult[] indexResult = indexScanner.next(nbRows);
                if (indexResult == null) {
                  return null;
                }
                RowResult[] result = new RowResult[indexResult.length];
                for (int i = 0; i < indexResult.length; i++) {
                  RowResult row = indexResult[i];
                  byte[] baseRow = row.get(INDEX_BASE_ROW_COLUMN).getValue();
                  LOG.debug("next index row [" + Bytes.toString(row.getRow())
                      + "] -> base row [" + Bytes.toString(baseRow) + "]");
                  HbaseMapWritable<byte[], Cell> colValues =
                    new HbaseMapWritable<byte[], Cell>();
                  if (columns != null && columns.length > 0) {
                    LOG.debug("Going to base table for remaining columns");
                    RowResult baseResult = IndexedTable.this.getRow(baseRow, columns);
                    colValues.putAll(baseResult);
                  }
                  for (Entry<byte[], Cell> entry : row.entrySet()) {
                    byte[] col = entry.getKey();
                    if (HStoreKey.matchingFamily(INDEX_COL_FAMILY_NAME, col)) {
                      continue;
                    }
                    colValues.put(col, entry.getValue());
                  }
                  result[i] = new RowResult(baseRow, colValues);
                }
                return result;
              }
          
          
          Show
          Andrew Purtell added a comment - - edited o.a.h.h.client.tableindexed.IndexScanner.ScannerWrapper needs next(int). A suggestion: /** {@inheritDoc} */ public RowResult next() throws IOException { RowResult[] result = next(1); if (result == null || result.length < 1) return null ; return result[0]; } /** {@inheritDoc} */ public RowResult[] next( int nbRows) throws IOException { RowResult[] indexResult = indexScanner.next(nbRows); if (indexResult == null ) { return null ; } RowResult[] result = new RowResult[indexResult.length]; for ( int i = 0; i < indexResult.length; i++) { RowResult row = indexResult[i]; byte [] baseRow = row.get(INDEX_BASE_ROW_COLUMN).getValue(); LOG.debug( "next index row [" + Bytes.toString(row.getRow()) + "] -> base row [" + Bytes.toString(baseRow) + "]" ); HbaseMapWritable< byte [], Cell> colValues = new HbaseMapWritable< byte [], Cell>(); if (columns != null && columns.length > 0) { LOG.debug( "Going to base table for remaining columns" ); RowResult baseResult = IndexedTable. this .getRow(baseRow, columns); colValues.putAll(baseResult); } for (Entry< byte [], Cell> entry : row.entrySet()) { byte [] col = entry.getKey(); if (HStoreKey.matchingFamily(INDEX_COL_FAMILY_NAME, col)) { continue ; } colValues.put(col, entry.getValue()); } result[i] = new RowResult(baseRow, colValues); } return result; }
          Hide
          Andrew Purtell added a comment -

          We have a maintenance window this afternoon. I'll take the opportunity to set this up for indefinite use on our "big" 25 node cluster. There's one thing we do in particular where I'd like to change the result ordering of a scanner using a secondary index. Will report back if there are any problems.

          Show
          Andrew Purtell added a comment - We have a maintenance window this afternoon. I'll take the opportunity to set this up for indefinite use on our "big" 25 node cluster. There's one thing we do in particular where I'd like to change the result ordering of a scanner using a secondary index. Will report back if there are any problems.
          Hide
          stack added a comment -

          Clint:

          This won't work, right:

          +    } 
          +    if (regionInfo != null && regionInfo.getTableDesc().getRowKeyComparator() != null) {
          +      return regionInfo.getTableDesc().getRowKeyComparator().compare(rowA, rowB);
               }
          

          i.e. a table-specific comparator. Since the .META. and ROOT use different comparator, there is going to be a disagreement, someday.

          Remove it from above and from HTableDescriptor?

          Minor: The below should be wrapped in an if (LOG.isDebugEnable()) check:

          +    LOG.debug("Index [" + indexSpec.getIndexId() + "] adding new entry ["
          +        + Bytes.toString(indexUpdate.getRow()) + "] for row ["
          +        + Bytes.toString(row) + "]");
          

          Otherwise you're paying string creation and then just throwing it all way when running at INFO level.

          Minor: Can you not do putAll below?

          +    for(IndexSpecification index : indexes) {
          +      this.indexes.put(index.getIndexId(), index);
          +    }
          

          Minor: Missing javadoc on this class

          +public class ReverseByteArrayComparator implements WritableComparator<byte[]> {
          
          

          Minor: All data members in IndexSpecification look like they should be final; IndexSpecification looks immutable.

          Minor: When in hbase-land, its better to use HbaseObjectWritable rather than ObjectWritable. You'll get a little performance boost. You'll likely have to add new codes for your new classes; see head of HbaseObjectWritable.

          On above, the minor's are not important. Just FYI. I do think that removing the comparator important because could propagate wrong impression – that an table-specific comparator is a possibility (unless you know something I don't).

          Thanks Clint.

          Show
          stack added a comment - Clint: This won't work, right: + } + if (regionInfo != null && regionInfo.getTableDesc().getRowKeyComparator() != null ) { + return regionInfo.getTableDesc().getRowKeyComparator().compare(rowA, rowB); } i.e. a table-specific comparator. Since the .META. and ROOT use different comparator, there is going to be a disagreement, someday. Remove it from above and from HTableDescriptor? Minor: The below should be wrapped in an if (LOG.isDebugEnable()) check: + LOG.debug( "Index [" + indexSpec.getIndexId() + "] adding new entry [" + + Bytes.toString(indexUpdate.getRow()) + "] for row [" + + Bytes.toString(row) + "]" ); Otherwise you're paying string creation and then just throwing it all way when running at INFO level. Minor: Can you not do putAll below? + for (IndexSpecification index : indexes) { + this .indexes.put(index.getIndexId(), index); + } Minor: Missing javadoc on this class + public class ReverseByteArrayComparator implements WritableComparator< byte []> { Minor: All data members in IndexSpecification look like they should be final; IndexSpecification looks immutable. Minor: When in hbase-land, its better to use HbaseObjectWritable rather than ObjectWritable. You'll get a little performance boost. You'll likely have to add new codes for your new classes; see head of HbaseObjectWritable. On above, the minor's are not important. Just FYI. I do think that removing the comparator important because could propagate wrong impression – that an table-specific comparator is a possibility (unless you know something I don't). Thanks Clint.
          Hide
          stack added a comment -

          Lets add this to 0.19.0.

          Show
          stack added a comment - Lets add this to 0.19.0.
          Hide
          Clint Morgan added a comment -

          >> Secondary indexes run on top of a transactional hbase? Thats needed because insert into secondary is transactional with primary table?

          Yeah thats correct. I want transactional behavior to happen first, then indexes get updated iff a transaction is committed.

          Show
          Clint Morgan added a comment - >> Secondary indexes run on top of a transactional hbase? Thats needed because insert into secondary is transactional with primary table? Yeah thats correct. I want transactional behavior to happen first, then indexes get updated iff a transaction is committed.
          Hide
          stack added a comment -

          Clint: Secondary indexes run on top of a transactional hbase? Thats needed because insert into secondary is transactional with primary table? This patch makes some changes to a few core classes but they are a few only and they look like they do not require migration scripts because the classes migrate themselves. I like the package.html documentation.

          Has anyone else tried Clint's patch? Does it work for them?

          Show
          stack added a comment - Clint: Secondary indexes run on top of a transactional hbase? Thats needed because insert into secondary is transactional with primary table? This patch makes some changes to a few core classes but they are a few only and they look like they do not require migration scripts because the classes migrate themselves. I like the package.html documentation. Has anyone else tried Clint's patch? Does it work for them?
          Hide
          Clint Morgan added a comment -

          Latest version. Completes the support for deletes, fixes a spelling mistake.

          Show
          Clint Morgan added a comment - Latest version. Completes the support for deletes, fixes a spelling mistake.
          Hide
          Andrew Purtell added a comment -

          +1 after HBase-880 goes in and fixups are made. Being able to ordering a scan on/by a secondary index is an important feature for HBase users, and it's great that you've done the heavy lifting for the rest of us Clint. If another means for indexing tables is added later, the appropriate refactoring can be done at that time to make everything make sense and play together.

          Show
          Andrew Purtell added a comment - +1 after HBase-880 goes in and fixups are made. Being able to ordering a scan on/by a secondary index is an important feature for HBase users, and it's great that you've done the heavy lifting for the rest of us Clint. If another means for indexing tables is added later, the appropriate refactoring can be done at that time to make everything make sense and play together.
          Hide
          Clint Morgan added a comment -

          Andrew: you're right that I don't need to have the transactional stuff inherit from the indexed stuff. My intuition to do this came from the need to have transactional logic happen first, and then the indexing stuff only happen as a transaction is successfully committed. However, I can make this work with the normal polymorphic mechanism...

          This patch does that, as well as address a couple of performance issues I found with profiling.

          Show
          Clint Morgan added a comment - Andrew: you're right that I don't need to have the transactional stuff inherit from the indexed stuff. My intuition to do this came from the need to have transactional logic happen first, and then the indexing stuff only happen as a transaction is successfully committed. However, I can make this work with the normal polymorphic mechanism... This patch does that, as well as address a couple of performance issues I found with profiling.
          Hide
          Andrew Purtell added a comment -

          Hi Clint. Does it have to be that transaction classes inherit from index classes? Is it possible to have transactions without indexes? Wouldn't having the index classes inherit from the transactional ones accomplish the same thing? Or am I missing something?

          Show
          Andrew Purtell added a comment - Hi Clint. Does it have to be that transaction classes inherit from index classes? Is it possible to have transactions without indexes? Wouldn't having the index classes inherit from the transactional ones accomplish the same thing? Or am I missing something?
          Hide
          Clint Morgan added a comment -

          There was an index maintenance bug introduced in last patch, this fixes it.

          (IndexedRegion:123) oldEntry.getValue() -> oldEntry.getKey()

          Show
          Clint Morgan added a comment - There was an index maintenance bug introduced in last patch, this fixes it. (IndexedRegion:123) oldEntry.getValue() -> oldEntry.getKey()
          Hide
          Clint Morgan added a comment -

          Latest version, fixed bug related to index maintenance, added a minimal package.html doc

          Show
          Clint Morgan added a comment - Latest version, fixed bug related to index maintenance, added a minimal package.html doc
          Hide
          Clint Morgan added a comment -

          Thanks for the review, responded to most of it:

          The stuff dealing with RowKeyCompartors is my half-arse attempt at
          661. It seems to work but I have not tried it when regions start to
          split, which i think is where the issues will arise.

          I need RowKeyComparators, because some indexes (those that go in
          reverse order, or deserialize the index keys (column values from base
          table) to order by EG long values) need to sort non-lexicographically.

          This WritableComparator is diff that the one in hadoop. Mine just brings the
          two interfaces together, while the hadoop version is a general class
          for comparing byte[]s.

          Made PerfEval.fomat public

          Changed package names to tableindexed

          Regards including the index specs in HTD, it made sense to me. I see
          the pollution concern, but indexes seem like a fundamental part of the
          tables meta-data. I can rework in subsequent patch If need be.

          This version has transactional tables/regions inherit from the
          indexed tables/regions. This will allow the index updates to have
          proper transactional behavior (when updated as part of a transaction).

          I've tested this in my object-datastore layer above hbase, and passes some basic
          tests. There is a reasonable deal of logic that goes into the
          RowKeyGenerator as it must carefully build the index keys to get the
          desired ordering. I'm building my index keys prefixed with the same
          prefix used in the original keys to maintain "sharding". This
          way I can do an quick index scan for data from a give domain (EG, for
          a specific customer).

          Show
          Clint Morgan added a comment - Thanks for the review, responded to most of it: The stuff dealing with RowKeyCompartors is my half-arse attempt at 661. It seems to work but I have not tried it when regions start to split, which i think is where the issues will arise. I need RowKeyComparators, because some indexes (those that go in reverse order, or deserialize the index keys (column values from base table) to order by EG long values) need to sort non-lexicographically. This WritableComparator is diff that the one in hadoop. Mine just brings the two interfaces together, while the hadoop version is a general class for comparing byte[]s. Made PerfEval.fomat public Changed package names to tableindexed Regards including the index specs in HTD, it made sense to me. I see the pollution concern, but indexes seem like a fundamental part of the tables meta-data. I can rework in subsequent patch If need be. This version has transactional tables/regions inherit from the indexed tables/regions. This will allow the index updates to have proper transactional behavior (when updated as part of a transaction). I've tested this in my object-datastore layer above hbase, and passes some basic tests. There is a reasonable deal of logic that goes into the RowKeyGenerator as it must carefully build the index keys to get the desired ordering. I'm building my index keys prefixed with the same prefix used in the original keys to maintain "sharding". This way I can do an quick index scan for data from a give domain (EG, for a specific customer).
          Hide
          stack added a comment -

          + This looks like it could soon become annoying '+ LOG.info("Index [" + indexSpec.getIndexId()'... should be DEBUG-level at least.
          + How can we make it so you don't have to do this to HTD:

          +  private final Map<String, IndexSpecification> indexes =
          +    new HashMap<String, IndexSpecification>();
          

          It pollutes HTD with indices. Should we add a map that subclasses such as this Indexer can stuff things like indexes into? Rather than have each specialization add to HTD?
          + More pollution from another patch: WritableComparator.java... and besides, ain't this up in hadoop? And ReverseByteArrayComparator. (These new classes are lacking licenses).

          I didn't try it but patch looks generally good to me.

          One thought is that this fancy feature really should be an option on default hbase. Creating your table, it'd be an option spec'ing secondary indices. But that can wait. Lets do it this way where its as a subclass first. If the demand, we can move it back into core hbase.

          Show
          stack added a comment - + This looks like it could soon become annoying '+ LOG.info("Index [" + indexSpec.getIndexId()'... should be DEBUG-level at least. + How can we make it so you don't have to do this to HTD: + private final Map< String , IndexSpecification> indexes = + new HashMap< String , IndexSpecification>(); It pollutes HTD with indices. Should we add a map that subclasses such as this Indexer can stuff things like indexes into? Rather than have each specialization add to HTD? + More pollution from another patch: WritableComparator.java... and besides, ain't this up in hadoop? And ReverseByteArrayComparator. (These new classes are lacking licenses). I didn't try it but patch looks generally good to me. One thought is that this fancy feature really should be an option on default hbase. Creating your table, it'd be an option spec'ing secondary indices. But that can wait. Lets do it this way where its as a subclass first. If the demand, we can move it back into core hbase.
          Hide
          stack added a comment -

          Hey Clint:

          Highlevel, talk on the list has made mention of secondary indices done using lucene rather than keeping a second table. Because of this, one suggestion to consider – non-binding, just a suggestion – is that you might include the type of your secondary index – i.e. table index – when naming classes, etc. For example, IndexedRegionServer should perhaps become TableIndexedRegionServer (bit of a mouthful – maybe you have a better name) or maybe better, just change your package name – use tableindexed or tindexed instead of indexed – so its clear how your secondary index is implemented.

          In HTD:

          + Is this inclusion intentional: '+ public static final String ROW_KEY_COMPARATOR = "ROW_KEY_COMPARATOR";'? Is this leak from another patch? Same for HSK.java.... and setRowKeyComparator in HTD, etc.
          + You have to copy '+ static private byte[] format(final int number) {' from PerformanceEvaluation because its inaccessible? I'd suggest change access on PE so you don't have to duplicate.

          ... more to follow

          Show
          stack added a comment - Hey Clint: Highlevel, talk on the list has made mention of secondary indices done using lucene rather than keeping a second table. Because of this, one suggestion to consider – non-binding, just a suggestion – is that you might include the type of your secondary index – i.e. table index – when naming classes, etc. For example, IndexedRegionServer should perhaps become TableIndexedRegionServer (bit of a mouthful – maybe you have a better name) or maybe better, just change your package name – use tableindexed or tindexed instead of indexed – so its clear how your secondary index is implemented. In HTD: + Is this inclusion intentional: '+ public static final String ROW_KEY_COMPARATOR = "ROW_KEY_COMPARATOR";'? Is this leak from another patch? Same for HSK.java.... and setRowKeyComparator in HTD, etc. + You have to copy '+ static private byte[] format(final int number) {' from PerformanceEvaluation because its inaccessible? I'd suggest change access on PE so you don't have to duplicate. ... more to follow
          Hide
          Clint Morgan added a comment -

          I'm on vacation next week, but plan to wrap this up the week after.

          Show
          Clint Morgan added a comment - I'm on vacation next week, but plan to wrap this up the week after.
          Hide
          Clint Morgan added a comment -

          Need to specify per-table row key comparator

          Show
          Clint Morgan added a comment - Need to specify per-table row key comparator
          Hide
          Clint Morgan added a comment -

          In progress, incomplete, version with a working test. The test should be enough to communicate the API and semantics. Feedback welcomed.

          Show
          Clint Morgan added a comment - In progress, incomplete, version with a working test. The test should be enough to communicate the API and semantics. Feedback welcomed.

            People

            • Assignee:
              Andrew Purtell
              Reporter:
              Clint Morgan
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development