• Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.92.0
    • Component/s: Coprocessors
    • Labels:
    • Hadoop Flags:


      From Google's Jeff Dean, in a keynote to LADIS 2009 (, slides 66 - 67):

      BigTable Coprocessors (New Since OSDI'06)

      • Arbitrary code that runs run next to each tablet in table
        • As tablets split and move, coprocessor code automatically splits/moves too
      • High-level call interface for clients
        • Unlike RPC, calls addressed to rows or ranges of rows
      • coprocessor client library resolves to actual locations
        • Calls across multiple rows automatically split into multiple parallelized RPCs
      • Very flexible model for building distributed services
        • Automatic scaling, load balancing, request routing for apps

      Example Coprocessor Uses

      • Scalable metadata management for Colossus (next gen GFS-like file system)
      • Distributed language model serving for machine translation system
      • Distributed query processing for full-text indexing support
      • Regular expression search support for code repository

      For HBase, adding a coprocessor framework will allow for pluggable incremental addition of functionality. No more need to subclass the regionserver interface and implementation classes and set hbase.regionserver.class and hbase.regionserver.impl in hbase-site.xml. That mechanism allows for extension but at the exclusion of all others.

      Also in HBASE-2001 currently there is a in-process map reduce framework for the regionservers. Coprocessors can optionally implement a 'MapReduce' interface which clients will be able to invoke concurrently on all regions of the table. Note this is not MapReduce on the table; this is MapReduce on each region, concurrently. One can implement MapReduce in a manner very similar to Hadoop's MR framework, or use shared variables to avoid the overhead of generating (and processing) a lot of intermediates. An initial application of this could be support for rapid calculation of aggregates over data stored in HBase.


          Issue Links

          There are no Sub-Tasks for this issue.



              • Assignee:
                apurtell Andrew Purtell
              • Votes:
                6 Vote for this issue
                39 Start watching this issue


                • Created: