Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-2000


VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.92.0
    • Coprocessors
    • None
    • Reviewed


      From Google's Jeff Dean, in a keynote to LADIS 2009 (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, slides 66 - 67):

      BigTable Coprocessors (New Since OSDI'06)

      • Arbitrary code that runs run next to each tablet in table
        • As tablets split and move, coprocessor code automatically splits/moves too
      • High-level call interface for clients
        • Unlike RPC, calls addressed to rows or ranges of rows
      • coprocessor client library resolves to actual locations
        • Calls across multiple rows automatically split into multiple parallelized RPCs
      • Very flexible model for building distributed services
        • Automatic scaling, load balancing, request routing for apps

      Example Coprocessor Uses

      • Scalable metadata management for Colossus (next gen GFS-like file system)
      • Distributed language model serving for machine translation system
      • Distributed query processing for full-text indexing support
      • Regular expression search support for code repository

      For HBase, adding a coprocessor framework will allow for pluggable incremental addition of functionality. No more need to subclass the regionserver interface and implementation classes and set hbase.regionserver.class and hbase.regionserver.impl in hbase-site.xml. That mechanism allows for extension but at the exclusion of all others.

      Also in HBASE-2001 currently there is a in-process map reduce framework for the regionservers. Coprocessors can optionally implement a 'MapReduce' interface which clients will be able to invoke concurrently on all regions of the table. Note this is not MapReduce on the table; this is MapReduce on each region, concurrently. One can implement MapReduce in a manner very similar to Hadoop's MR framework, or use shared variables to avoid the overhead of generating (and processing) a lot of intermediates. An initial application of this could be support for rapid calculation of aggregates over data stored in HBase.


        Issue Links


          This comment will be Viewable by All Users Viewable by All Users


            Unassigned Unassigned
            apurtell Andrew Kyle Purtell
            6 Vote for this issue
            39 Start watching this issue




                Issue deployment