[HBASE-2000] Coprocessors - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.92.0
Component/s: Coprocessors
Labels:
None

Hadoop Flags:

Reviewed

Description

From Google's Jeff Dean, in a keynote to LADIS 2009 (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, slides 66 - 67):

BigTable Coprocessors (New Since OSDI'06)

Arbitrary code that runs run next to each tablet in table
- As tablets split and move, coprocessor code automatically splits/moves too

High-level call interface for clients
- Unlike RPC, calls addressed to rows or ranges of rows

coprocessor client library resolves to actual locations
- Calls across multiple rows automatically split into multiple parallelized RPCs

Very flexible model for building distributed services
- Automatic scaling, load balancing, request routing for apps

Example Coprocessor Uses

Scalable metadata management for Colossus (next gen GFS-like file system)

Distributed language model serving for machine translation system

Distributed query processing for full-text indexing support

Regular expression search support for code repository

For HBase, adding a coprocessor framework will allow for pluggable incremental addition of functionality. No more need to subclass the regionserver interface and implementation classes and set hbase.regionserver.class and hbase.regionserver.impl in hbase-site.xml. That mechanism allows for extension but at the exclusion of all others.

Also in ~~HBASE-2001~~ currently there is a in-process map reduce framework for the regionservers. Coprocessors can optionally implement a 'MapReduce' interface which clients will be able to invoke concurrently on all regions of the table. Note this is not MapReduce on the table; this is MapReduce on each region, concurrently. One can implement MapReduce in a manner very similar to Hadoop's MR framework, or use shared variables to avoid the overhead of generating (and processing) a lot of intermediates. An initial application of this could be support for rapid calculation of aggregates over data stored in HBase.

Attachments

Issue Links

is blocked by

HBASE-2321 Support RPC interface changes at runtime

Closed

relates to

HBASE-1845 MultiGet, MultiDelete, and MultiPut - batched to the appropriate region servers

Closed

HBASE-1935 Scan in parallel

Closed

HBASE-2893 Table metacolumns

Closed

HBASE-3340 Eventually Consistent Secondary Indexing via Coprocessors

Closed

HBASE-3341 Increment Row-Level Group Commit via Coprocessors

Closed

HBASE-3342 Server-side Row-level Inverted Index Join via Coprocessors

Closed

HBASE-74 [performance] When a get or scan request spans multiple columns, execute the reads in parallel

Closed

(3 relates to)

Sub-Tasks

1.	Coprocessors: Client side support	Closed	Gary Helmling
2.	Coprocessors: Colocate user code with regions	Closed	Mingjie Lai
3.	Allow Observers to completely override base function	Closed	Andrew Kyle Purtell
4.	Coprocessors: Lifecycle management	Closed	Gary Helmling
5.	Coprocessors: Coprocessor host and observer for HMaster	Closed	Gary Helmling
6.	Coprocessors: Extend server side integration API to include HLog operations	Closed	Mingjie Lai
7.	Coprocessors: Distributed query processing	Closed	Unassigned
8.	Coprocessors: Support aggregate functions	Closed	Himanshu Vashishtha
9.	Coprocessors: Support small query language as filter on server side	Closed	Unassigned

Activity

People

Assignee:: Unassigned

Reporter:: Andrew Kyle Purtell

Votes:: 6 Vote for this issue

Watchers:: 39 Start watching this issue

Dates

Created:: 21/Nov/09 16:49

Updated:: 20/Nov/15 13:01

Resolved:: 11/Jun/11 14:12