[PIG-2167] CUBE operation in Pig - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
- gsoc2012
- mentor

Description

Computing aggregates over a cube of several dimensions is a common operation in data warehousing.

The standard SQL syntax is "GROUP relation BY dim1, dim2, dim3 WITH CUBE" – which in addition to all dim1-2-3, produces aggregations for just dim1, just dim1 and dim2, etc. NULL is generally used to represent "all".

A presentation by Arnab Nandi describes how one might implement efficient cubing in Map-Reduce here: http://pdf.cx/44wrk

We can start with the naive solution which only works for algebraic measures, and work up from there.

This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Pig-Cubing-Performance.png
21/Mar/12 23:41
20 kB
Prasanth Jayachandran
PIG-2167.4.patch
06/May/12 00:18
62 kB
Prasanth Jayachandran
PIG-2167.3.patch
01/May/12 05:30
59 kB
Prasanth Jayachandran
PIG-2167.2.patch
12/Apr/12 07:07
58 kB
Prasanth Jayachandran
PIG-2167.1.patch
21/Mar/12 23:46
57 kB
Prasanth Jayachandran

Sub-Tasks

1.	CubeDimensions UDF	Closed	Dmitriy V. Ryaboy
2.	Implement Naive CUBE operator	Closed	Prasanth Jayachandran
3.	Handling legitimate NULL values	Closed	Prasanth Jayachandran
4.	Implementing RollupDimensions UDF and adding ROLLUP clause in CUBE operator	Closed	Prasanth Jayachandran
5.	MR-Cube implementation (Distributed cubing for holistic measures)	Open	Prasanth Jayachandran

Activity

People

Assignee:: Prasanth Jayachandran

Reporter:: Dmitriy V. Ryaboy

Votes:: 6 Vote for this issue

Watchers:: 24 Start watching this issue

Dates

Created:: 15/Jul/11 15:11

Updated:: 25/Dec/13 12:04