HBase
  1. HBase
  2. HBASE-3434

ability to increment a counter without reading original value from storage

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Client, regionserver
    • Labels:
      None

      Description

      There are a bunch of applications that do read-modify-write operations on HBase constructs, e.g a counter; The counter value has to be read in from hdfs before it can be incremented. We have an application where the number of increments on a counter far outnumbers the number of times the counter is used or read. For these type of applications, it will be very beneficial to not have to read in the counter from disk before it can be incremented.

        Activity

        Hide
        Asaf Mesika added a comment -

        Dhurba - did you managed to code this stuff? We've just been hitting the exact same problem. I've seen a very similar design implemented in project HBaseHUT, but he focused on Puts rather than increments and decided to place code in Client and a M/R job other than using co-processors

        Show
        Asaf Mesika added a comment - Dhurba - did you managed to code this stuff? We've just been hitting the exact same problem. I've seen a very similar design implemented in project HBaseHUT , but he focused on Puts rather than increments and decided to place code in Client and a M/R job other than using co-processors
        Hide
        Jonathan Gray added a comment -

        I think it remains to be seen exactly where features implemented via coprocessors will live. Certainly there will be some open source home for them, I just think there's some aversion to shipping with and managing a big set of contribs.

        Show
        Jonathan Gray added a comment - I think it remains to be seen exactly where features implemented via coprocessors will live. Certainly there will be some open source home for them, I just think there's some aversion to shipping with and managing a big set of contribs.
        Hide
        dhruba borthakur added a comment -

        Using co-processors seems to be the right thing to do. if I build it this way, do I still contribute the code back to the Apache HBase svn tree? if so, where will it be located?

        Show
        dhruba borthakur added a comment - Using co-processors seems to be the right thing to do. if I build it this way, do I still contribute the code back to the Apache HBase svn tree? if so, where will it be located?
        Hide
        stack added a comment -

        (No worries regards blue-skying Dhruba... go for it).

        If we did not want to alter the fundamentals of HBase, Collections might be done as a Coprocessor instance. Coprocessors have hooks pre/post Get as well as on flush/compact. You'd mark the region to load KeyCollectionCoprocessor. The KCCP would work on ColumnFamilies marked as KeyCollections either of type increment or type List. On flush, we'd write out the aggregating Tombstone (We want to avoid data bloat if we can – just don't put it into the FS beyond the WAL). On Get, we'd aggregate until we hit a tombstone writing back a new tombstone record if "too many" deltas have gone in since the last tombstone.

        Having all this KeyCollection code cohere inside a Coprocessor is a nice way of keeping the code all together rather than spread about the server.

        Otherwise, we could make a HCD as carrying Collections only and then have the server do extra processing on Get, flush, compact if the HCD is of this type. If the marking was done as a special KV Type, then we could have Collections live in the same family as plain KVs, if we wanted to do such a thing (I don't think we do especially if the write-rate of increments is high).

        Show
        stack added a comment - (No worries regards blue-skying Dhruba... go for it). If we did not want to alter the fundamentals of HBase, Collections might be done as a Coprocessor instance. Coprocessors have hooks pre/post Get as well as on flush/compact. You'd mark the region to load KeyCollectionCoprocessor. The KCCP would work on ColumnFamilies marked as KeyCollections either of type increment or type List. On flush, we'd write out the aggregating Tombstone (We want to avoid data bloat if we can – just don't put it into the FS beyond the WAL). On Get, we'd aggregate until we hit a tombstone writing back a new tombstone record if "too many" deltas have gone in since the last tombstone. Having all this KeyCollection code cohere inside a Coprocessor is a nice way of keeping the code all together rather than spread about the server. Otherwise, we could make a HCD as carrying Collections only and then have the server do extra processing on Get, flush, compact if the HCD is of this type. If the marking was done as a special KV Type, then we could have Collections live in the same family as plain KVs, if we wanted to do such a thing (I don't think we do especially if the write-rate of increments is high).
        Hide
        dhruba borthakur added a comment -

        hi stack, Some of these ideas are still raw in my brain, so please excuse me if my writeup is not very coherant.

        There are two ways to mark these new records. One way is to have a new value for KeyValue.Type.Collection. The other option is to have a new field in HColumnDescriptor to say that all columns in that family are of Collection type. Do you have a preference (and why)? The two different types of collection to start with can be Counters and Lists.

        A cell that has the aggregated value is a TombStone (no scans need to go beyond that). A cell that does not have an aggregated value is called a Delta cell. A KeyCollection is made of Kvs. The precise-type of collection (whether Counters, List, etc) is stored in the Value itself.

        There are two triggers that merge of set of Deltas cells to create a TombStone cell. First, a major compaction thread aggregates Deltas into Tombstones. Secondly, when an application makes a Get() call, it aggregates on demand and stores a TombStone.

        Show
        dhruba borthakur added a comment - hi stack, Some of these ideas are still raw in my brain, so please excuse me if my writeup is not very coherant. There are two ways to mark these new records. One way is to have a new value for KeyValue.Type.Collection. The other option is to have a new field in HColumnDescriptor to say that all columns in that family are of Collection type. Do you have a preference (and why)? The two different types of collection to start with can be Counters and Lists. A cell that has the aggregated value is a TombStone (no scans need to go beyond that). A cell that does not have an aggregated value is called a Delta cell. A KeyCollection is made of Kvs. The precise-type of collection (whether Counters, List, etc) is stored in the Value itself. There are two triggers that merge of set of Deltas cells to create a TombStone cell. First, a major compaction thread aggregates Deltas into Tombstones. Secondly, when an application makes a Get() call, it aggregates on demand and stores a TombStone.
        Hide
        stack added a comment -

        @Dhruba Tell us more what attributes this 'new record' type will look like? How will it differ from plain cell? Will the cell be marked in some manner if its an aggregation of all behind it? The server will know to stop looking once it runs up against such a rollup point? Will rollups be done in background or on a read?

        A KeyCollection is made of KVs? If not, what are its atoms? Or its a special version of a KV (We have a Type field in KVs so we might be able to add in a Collection type — or List or Counter type if thats what we really want). To get a view on a Collection, we'd have to fetch all mentions of a particular 'Collection' back as far as the last Collection 'tombstone' instance then roll it all up to present the current view?

        Good stuff Dhruba.

        Show
        stack added a comment - @Dhruba Tell us more what attributes this 'new record' type will look like? How will it differ from plain cell? Will the cell be marked in some manner if its an aggregation of all behind it? The server will know to stop looking once it runs up against such a rollup point? Will rollups be done in background or on a read? A KeyCollection is made of KVs? If not, what are its atoms? Or its a special version of a KV (We have a Type field in KVs so we might be able to add in a Collection type — or List or Counter type if thats what we really want). To get a view on a Collection, we'd have to fetch all mentions of a particular 'Collection' back as far as the last Collection 'tombstone' instance then roll it all up to present the current view? Good stuff Dhruba.
        Hide
        ryan rawson added a comment -

        can we layer this on top of the existing infrastructure, instead of
        making fundamental changes to KeyValue and HFile? Or if we must make
        changes, let's do it in very minimal ways, eg: add 1 or 2 more types
        to KV.

        On Mon, Jan 10, 2011 at 11:58 AM, dhruba borthakur (JIRA)

        Show
        ryan rawson added a comment - can we layer this on top of the existing infrastructure, instead of making fundamental changes to KeyValue and HFile? Or if we must make changes, let's do it in very minimal ways, eg: add 1 or 2 more types to KV. On Mon, Jan 10, 2011 at 11:58 AM, dhruba borthakur (JIRA)
        Hide
        dhruba borthakur added a comment -

        The idea is that an increment to a counter is recorded in a new record as an "increment" to the counter, it does not need to have the original value. A new type of Get() call will assort all the "increment" records associated with this counter and return the correct value to the application.

        In fact, I would like to discuss how the basic Key-Value construct be extended to a Key-Collection construct. A Collection has base primitives like adding/deleting/modifying to it, a List and a Counter are two special classes of this Collection construct. In the current implementation, when an application wants to delete an element from a Collection, it reads(R) the List from a HBase key-value, modifies it in memory and then writes a new serialized Collection back to HBase. If the "Collection" was a basic primitive offered by HBase, then the application would have just written a new record to indicate "delete element x from Collection". This would eliminate the step (marked as (R) above) and converts a read-modify-write to a pure write operation. The Counter is just a specialized version of a Collection construct. The lazy background compaction process assorts all the "operations" on the Collection and materializes a true value of the Collection (let's call it the tombstone). The creation of the tombstone indicates that all KVs with an older timestamp can be safely discarded.

        The benefit is that all random reads are now converted to sequential reads, thus leading to better scalability of storage. Another benefit could be that it might be easier to reduce conflicts while merging records using HBase replication because it is easier to merge operations-logs rather than absolute values!

        Show
        dhruba borthakur added a comment - The idea is that an increment to a counter is recorded in a new record as an "increment" to the counter, it does not need to have the original value. A new type of Get() call will assort all the "increment" records associated with this counter and return the correct value to the application. In fact, I would like to discuss how the basic Key-Value construct be extended to a Key-Collection construct. A Collection has base primitives like adding/deleting/modifying to it, a List and a Counter are two special classes of this Collection construct. In the current implementation, when an application wants to delete an element from a Collection, it reads(R) the List from a HBase key-value, modifies it in memory and then writes a new serialized Collection back to HBase. If the "Collection" was a basic primitive offered by HBase, then the application would have just written a new record to indicate "delete element x from Collection". This would eliminate the step (marked as (R) above) and converts a read-modify-write to a pure write operation. The Counter is just a specialized version of a Collection construct. The lazy background compaction process assorts all the "operations" on the Collection and materializes a true value of the Collection (let's call it the tombstone). The creation of the tombstone indicates that all KVs with an older timestamp can be safely discarded. The benefit is that all random reads are now converted to sequential reads, thus leading to better scalability of storage. Another benefit could be that it might be easier to reduce conflicts while merging records using HBase replication because it is easier to merge operations-logs rather than absolute values!
        Hide
        Jonathan Gray added a comment -

        That will only work if the working set of counters will fit in memory. When that is no longer the case is generally when you also see a rapid decline in performance and throughput.

        Also, the in-memory setting is not "in memory only" but just gives blocks from that family a higher priority in the LRU.

        Show
        Jonathan Gray added a comment - That will only work if the working set of counters will fit in memory. When that is no longer the case is generally when you also see a rapid decline in performance and throughput. Also, the in-memory setting is not "in memory only" but just gives blocks from that family a higher priority in the LRU.
        Hide
        Jeff Hammerbacher added a comment -

        Can you just put the column in a column family that's in memory only?

        Show
        Jeff Hammerbacher added a comment - Can you just put the column in a column family that's in memory only?

          People

          • Assignee:
            dhruba borthakur
            Reporter:
            dhruba borthakur
          • Votes:
            3 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:

              Development