[CHUKWA-567] Create a generic down sampling framework for time series metrics in hbase - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None
Environment:

Java 6, Mac OSX 10.6

Description

Large time series data can be down sampled in a generic way. This jira is to create a general down sampling framework which can be schedule in the background. In theory, a configuration file can specify which source table name, down sampled table name suffix and the interval to down sample.

For example:

chukwa.data.sample.tables=SystemMetrics,Hadoop
chukwa.down.sample.suffix=_monthly,_yearly
chukwa.down.sample.frequency=30,360

By this configuration, down sample framework will trigger down sampling job every 30 and 360 minutes respectively for each of the SystemMetrics and Hadoop table. The down sampled data are stored into SystemMetrics_monthly, SystemMetrics_yearly and Hadoop_monthly, and Hadoop_yearly respectively.

The down sampling framework will automatically create pig script with time and config parameters filled in and trigger the script to run, and if there are columns that can not be down sampled (non-numeric value), the first value will be used. The down sampling framework will use time and row key for grouping.

Oozie can be used as job scheduler for the down sampling framework, hence I only need to write the pig script and Oozie workflow to plugin parameters. Suggestion and recommendation are welcome.

Attachments

Issue Links

incorporates

CHUKWA-570 Create a meta data table in HBase to keep track of columns

Open

Activity

People

Assignee:: Eric Yang

Reporter:: Eric Yang

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 18/Dec/10 07:57

Updated:: 30/Jan/11 18:33