HBase
  1. HBase
  2. HBASE-605

allow scanners which return results ordred by a column value

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: 0.2.0
    • Fix Version/s: 0.19.0
    • Component/s: Client, regionserver
    • Labels:
      None

      Description

      We would like to be able to scan though tables with results ordered by (deserialized) column values. This approach maintains an in-memory sorted set for each ordered-by column in each HStore. This allows us to iterate through the keys in column order, and to random reads on the key to get the full row.

      Without the index, then we have to scan through all the rows to get the first result ordered by a column. Thus, when R is the number of rows in a table, N is the number of ordered-by rows we want, and R >> N we can save a lot of work by not doing the full table scan.

      1. hbase-605.patch
        41 kB
        Clint Morgan
      2. hbase-605-v2.patch
        49 kB
        Clint Morgan
      3. hbase-605-v3.patch
        61 kB
        Clint Morgan

        Activity

        Hide
        Clint Morgan added a comment -

        This patch contains a minimal implementation, and small unit test.

        One known deficiency is that the sorted set is build twice per hregion upon splitting. This is due to hergions being opened then immediately closed upon a split.

        I've tested a bit more in our layers above hbase and it works for me (so far).

        Show
        Clint Morgan added a comment - This patch contains a minimal implementation, and small unit test. One known deficiency is that the sorted set is build twice per hregion upon splitting. This is due to hergions being opened then immediately closed upon a split. I've tested a bit more in our layers above hbase and it works for me (so far).
        Hide
        stack added a comment -

        Mark this as patch available so it gets a review

        Show
        stack added a comment - Mark this as patch available so it gets a review
        Hide
        stack added a comment -

        This patch has much merit if only for the fact that it verifies (after making few tweaks) that a subclass of HRegionServer is possible.

        + Source is < 80 columns wide in hadoop
        + Should you be subclassing HColumnDescriptor too? Should it be versioned too?
        + Should we instead add accesors to HRS for the data members you changed from private to protected? (leases and requestCount)
        + We need to make HRegion subclassable or at least be configurable about which HStore to use? HTable too (As is they are 'polluted' with your sorted column code)

        Show
        stack added a comment - This patch has much merit if only for the fact that it verifies (after making few tweaks) that a subclass of HRegionServer is possible. + Source is < 80 columns wide in hadoop + Should you be subclassing HColumnDescriptor too? Should it be versioned too? + Should we instead add accesors to HRS for the data members you changed from private to protected? (leases and requestCount) + We need to make HRegion subclassable or at least be configurable about which HStore to use? HTable too (As is they are 'polluted' with your sorted column code)
        Hide
        Clint Morgan added a comment -

        updated to trunk and responded to comments. Subclassing minimizes pollution with core code...

        Show
        Clint Morgan added a comment - updated to trunk and responded to comments. Subclassing minimizes pollution with core code...
        Hide
        stack added a comment -

        In TestOrderedScanner, do you want to remove commented out code? Do you want to add a class comment that says this test depends/uses a 'special' version of HRegionServer. Would suggest that all classes that depend on this custom HRegionServer also get marked appropriately in their class comment (@see?): e.g. OrderedScanner won't work unless its going against the ordered HRS – same for OrderedHRegion.

        Some classes are missing licenses.

        I suppose package protection prevents you putting all these new classes into a new orderedregionserver package or into a subpackage named regionserver.ordered and client.ordered or some such?

        You need to explain somewhere in javadoc what this OrderedRegionServer is, how it works, and how to enable it. Would suggest that the class comment in the OrderedRegionServer or in the Ordered Interface as good places (otherwise, should I put in place a package.html to which you can add?). What would be great is that the next time someone shows up asking how they can customize regionserver behavior, we can just point them to your OrderedRegionServer javadoc as an example.

        Thanks for adding accessors rather than making data members protected in RegionServer and for making HStore, etc., subclassable.

        Otherwise, the patch looks great.

        Show
        stack added a comment - In TestOrderedScanner, do you want to remove commented out code? Do you want to add a class comment that says this test depends/uses a 'special' version of HRegionServer. Would suggest that all classes that depend on this custom HRegionServer also get marked appropriately in their class comment (@see?): e.g. OrderedScanner won't work unless its going against the ordered HRS – same for OrderedHRegion. Some classes are missing licenses. I suppose package protection prevents you putting all these new classes into a new orderedregionserver package or into a subpackage named regionserver.ordered and client.ordered or some such? You need to explain somewhere in javadoc what this OrderedRegionServer is, how it works, and how to enable it. Would suggest that the class comment in the OrderedRegionServer or in the Ordered Interface as good places (otherwise, should I put in place a package.html to which you can add?). What would be great is that the next time someone shows up asking how they can customize regionserver behavior, we can just point them to your OrderedRegionServer javadoc as an example. Thanks for adding accessors rather than making data members protected in RegionServer and for making HStore, etc., subclassable. Otherwise, the patch looks great.
        Hide
        Clint Morgan added a comment -

        responded to comments:

        high level overview in javadoc for OrderedRegionServer

        moved stuff into own packages in client.ordred and regionserver.ordered

        Show
        Clint Morgan added a comment - responded to comments: high level overview in javadoc for OrderedRegionServer moved stuff into own packages in client.ordred and regionserver.ordered
        Hide
        stack added a comment -

        Unfortunately, we're still java5. We probably won't go to java6 as a requirement until hbase 0.3, to match hadoop 0.18. Please purge the java6isms (NavigableSet in SortedColumn).

        Also, I get this compiling:

            [javac] /Users/stack/Documents/checkouts/trunk/src/java/org/apache/hadoop/hbase/LocalHBaseCluster.java:121: cannot find symbol
            [javac] symbol  : constructor IOException(java.lang.Exception)
            [javac] location: class java.io.IOException
            [javac]         throw new IOException(e);
        

        Do you?

        Thanks Clint (I already updated the FAQ to point to OrderedRegionServer as example modifying HRegionServer behavior).

        Show
        stack added a comment - Unfortunately, we're still java5. We probably won't go to java6 as a requirement until hbase 0.3, to match hadoop 0.18. Please purge the java6isms (NavigableSet in SortedColumn). Also, I get this compiling: [javac] /Users/stack/Documents/checkouts/trunk/src/java/org/apache/hadoop/hbase/LocalHBaseCluster.java:121: cannot find symbol [javac] symbol : constructor IOException(java.lang.Exception) [javac] location: class java.io.IOException [javac] throw new IOException(e); Do you? Thanks Clint (I already updated the FAQ to point to OrderedRegionServer as example modifying HRegionServer behavior).
        Hide
        Clint Morgan added a comment -

        Unfortunately I need the NavigableSet to get a descending iterator. SortedSet does not provide this functionality. Could use some 3rd party data structure but, ...

        I don't get the IOException compile error, thats a java6 change too...

        For the time being, I'm inclined to just leave this as a java6 patch and wait until java6 adoption to apply it. Works for me now, and I need to spend time on other things.

        Show
        Clint Morgan added a comment - Unfortunately I need the NavigableSet to get a descending iterator. SortedSet does not provide this functionality. Could use some 3rd party data structure but, ... I don't get the IOException compile error, thats a java6 change too... For the time being, I'm inclined to just leave this as a java6 patch and wait until java6 adoption to apply it. Works for me now, and I need to spend time on other things.
        Hide
        stack added a comment -

        OK Clint. I moved this to 0.3 hbase for now.

        Show
        stack added a comment - OK Clint. I moved this to 0.3 hbase for now.
        Hide
        Bryan Duxbury added a comment -

        You can't supply a Comparator to SortedSet that you code to act in reverse?

        Show
        Bryan Duxbury added a comment - You can't supply a Comparator to SortedSet that you code to act in reverse?
        Hide
        Clint Morgan added a comment -

        Nope. A single SortedSet is created/maintained per order-able column. The we'd like to iterate through it forwards or backwards to respond to the client's scanner request.

        I suppose we could maintain two such Sorted (forwards and backwards) by inverting the Comparator, but this seems a waste of space and time....

        Show
        Clint Morgan added a comment - Nope. A single SortedSet is created/maintained per order-able column. The we'd like to iterate through it forwards or backwards to respond to the client's scanner request. I suppose we could maintain two such Sorted (forwards and backwards) by inverting the Comparator, but this seems a waste of space and time....
        Hide
        Clint Morgan added a comment -

        Resolving this issue, as I've decided to go the table indexed approach of HBASE-883

        Show
        Clint Morgan added a comment - Resolving this issue, as I've decided to go the table indexed approach of HBASE-883

          People

          • Assignee:
            Unassigned
            Reporter:
            Clint Morgan
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development