[PHOENIX-838] Continuous queries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

Support continuous queries.

As a coprocessor application, Phoenix is well positioned to observe mutations and treat those observations as an event stream.

Continuous queries are persistent queries that run server side, typically expressed as structured queries using some extensions for defining a bounded subset of a potentially unbounded tuple stream. A Phoenix user could create a materialized view using WINDOW and other OLAP extensions to SQL discussed on PHOENIX-154 to define time- or tuple- based sliding windows, possibly partitioned, and an aggregating or filtering operation over those windows. This would trigger instantiation of a long running distributed task on the cluster for incrementally maintaining the view. ("Task" is meant here as a logical notion, it may not be a separate thread of execution.) As the task receives observer events and performs work, it would update state in memory for on-demand retrieval. For state reconstruction after failure the WAL could be overloaded with in-window event history and/or the in-memory state could be periodically checkpointed into shadow stores in the region.

Users would pick up the latest state maintained by the continuous query by querying the view, or perhaps Phoenix can do this transparently on any query if the optimizer determines equivalence.

This could be an important feature for Phoenix. Generally Phoenix and HBase are meant to handle high data volumes that overwhelm other data management options, so even subsets of the full data may present scale challenges. Many use cases mix ad hoc or exploratory full table scans with aggregates, rollups, or sampling queries over a subset or sample. The user wishes the latter queries to run as fast as possible. If that work can be done inline with the process of initially persisting mutations then we trade some memory and CPU resources up front to eliminate significant IO time later that would otherwise dominate.

Attachments

Issue Links

is related to

PHOENIX-971 Query server

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Andrew Kyle Purtell

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 11/Mar/14 23:14

Updated:: 06/May/14 23:16