Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
From Google's Jeff Dean, in a keynote to LADIS 2009 (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, slides 66 - 67):
BigTable Coprocessors (New Since OSDI'06)
- Arbitrary code that runs run next to each tablet in table
- As tablets split and move, coprocessor code automatically splits/moves too
- High-level call interface for clients
- Unlike RPC, calls addressed to rows or ranges of rows
- coprocessor client library resolves to actual locations
- Calls across multiple rows automatically split into multiple parallelized RPCs
- Very flexible model for building distributed services
- Automatic scaling, load balancing, request routing for apps
Example Coprocessor Uses
- Scalable metadata management for Colossus (next gen GFS-like file system)
- Distributed language model serving for machine translation system
- Distributed query processing for full-text indexing support
- Regular expression search support for code repository
For HBase, adding a coprocessor framework will allow for pluggable incremental addition of functionality. No more need to subclass the regionserver interface and implementation classes and set hbase.regionserver.class and hbase.regionserver.impl in hbase-site.xml. That mechanism allows for extension but at the exclusion of all others.
Also in HBASE-2001 currently there is a in-process map reduce framework for the regionservers. Coprocessors can optionally implement a 'MapReduce' interface which clients will be able to invoke concurrently on all regions of the table. Note this is not MapReduce on the table; this is MapReduce on each region, concurrently. One can implement MapReduce in a manner very similar to Hadoop's MR framework, or use shared variables to avoid the overhead of generating (and processing) a lot of intermediates. An initial application of this could be support for rapid calculation of aggregates over data stored in HBase.
Attachments
Issue Links
- is blocked by
-
HBASE-2321 Support RPC interface changes at runtime
- Closed
- relates to
-
HBASE-1845 MultiGet, MultiDelete, and MultiPut - batched to the appropriate region servers
- Closed
-
HBASE-1935 Scan in parallel
- Closed
-
HBASE-2893 Table metacolumns
- Closed
-
HBASE-3340 Eventually Consistent Secondary Indexing via Coprocessors
- Closed
-
HBASE-3341 Increment Row-Level Group Commit via Coprocessors
- Closed
-
HBASE-3342 Server-side Row-level Inverted Index Join via Coprocessors
- Closed
-
HBASE-74 [performance] When a get or scan request spans multiple columns, execute the reads in parallel
- Closed