Details
-
Brainstorming
-
Status: Closed
-
Major
-
Resolution: Later
-
None
-
None
-
None
Description
Consider a computational framework based on a stream processing model. Logically: Generators emit keys (row keys, or full keys with row+column:qualifier), fetch operators join keys to data fetched from the region, filters drop according to (perhaps complex) matching on the keys and/or values, combiners perform aggregation, mutators change values, decorators add data, sinks do something useful with items arriving from the stream, i.e. insert into response buffer, commit to region, replicate to peer. Pipelines execute in parallel. Partitioners can split streams for mulltithreading. Generators can be observers on a region for anchoring a continuous process or an iterator as the first stage of a pipeline constructed on demand with a terminating condition (like a Hadoop task). Kind of like Cascading within regionserver processes, a nice model if not literally Cascading the implementation. MapReduce can be supported with this model, is a subset of it. Data can be ordered or unordered, depends on the generator. Filters could be stateful or stateless: stateless filters could handle data arriving in any order; stateful filters could be used with an ordered generator.
Attachments
Issue Links
- relates to
-
HBASE-2469 Coprocessors: Distributed query processing
- Closed
-
HBASE-3131 Coprocessors: Server side embedding of Cascading/Cascalog operators
- Closed