Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.92.0
    • Component/s: Coprocessors
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      From Google's Jeff Dean, in a keynote to LADIS 2009 (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, slides 66 - 67):

      BigTable Coprocessors (New Since OSDI'06)

      • Arbitrary code that runs run next to each tablet in table
        • As tablets split and move, coprocessor code automatically splits/moves too
      • High-level call interface for clients
        • Unlike RPC, calls addressed to rows or ranges of rows
      • coprocessor client library resolves to actual locations
        • Calls across multiple rows automatically split into multiple parallelized RPCs
      • Very flexible model for building distributed services
        • Automatic scaling, load balancing, request routing for apps

      Example Coprocessor Uses

      • Scalable metadata management for Colossus (next gen GFS-like file system)
      • Distributed language model serving for machine translation system
      • Distributed query processing for full-text indexing support
      • Regular expression search support for code repository

      For HBase, adding a coprocessor framework will allow for pluggable incremental addition of functionality. No more need to subclass the regionserver interface and implementation classes and set hbase.regionserver.class and hbase.regionserver.impl in hbase-site.xml. That mechanism allows for extension but at the exclusion of all others.

      Also in HBASE-2001 currently there is a in-process map reduce framework for the regionservers. Coprocessors can optionally implement a 'MapReduce' interface which clients will be able to invoke concurrently on all regions of the table. Note this is not MapReduce on the table; this is MapReduce on each region, concurrently. One can implement MapReduce in a manner very similar to Hadoop's MR framework, or use shared variables to avoid the overhead of generating (and processing) a lot of intermediates. An initial application of this could be support for rapid calculation of aggregates over data stored in HBase.

        Issue Links

          Activity

          Hide
          stack added a comment -

          Or maybe we can't because some of the issues remain undone? But maybe enough has been done?

          Show
          stack added a comment - Or maybe we can't because some of the issues remain undone? But maybe enough has been done?
          Hide
          stack added a comment -

          Andrew, can we close this issue now?

          Show
          stack added a comment - Andrew, can we close this issue now?
          Hide
          stack added a comment -

          Moving out of 0.92.0. Pull it back in if you think different.

          Show
          stack added a comment - Moving out of 0.92.0. Pull it back in if you think different.
          Hide
          stack added a comment -

          Moved from 0.21 to 0.22 just after merge of old 0.20 branch into TRUNK.

          Show
          stack added a comment - Moved from 0.21 to 0.22 just after merge of old 0.20 branch into TRUNK.
          Hide
          Andrew Purtell added a comment -

          @Kay Kay: HBASE-2469, have at it.

          Show
          Andrew Purtell added a comment - @Kay Kay: HBASE-2469 , have at it.
          Hide
          Karthik K added a comment -

          Would it be ok to have a subtask under this to discuss distributed query processing using co-processors, that hbasene - http://github.com/akkumar/hbasene , intends to use to optimize query processing .

          Show
          Karthik K added a comment - Would it be ok to have a subtask under this to discuss distributed query processing using co-processors, that hbasene - http://github.com/akkumar/hbasene , intends to use to optimize query processing .
          Hide
          stack added a comment -

          Thats a sweet citation Andrew. Has update on BT. Has stuff on (a few things on) how they do replication.

          Show
          stack added a comment - Thats a sweet citation Andrew. Has update on BT. Has stuff on (a few things on) how they do replication.

            People

            • Assignee:
              Unassigned
              Reporter:
              Andrew Purtell
            • Votes:
              6 Vote for this issue
              Watchers:
              44 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development