Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Computing aggregates over a cube of several dimensions is a common operation in data warehousing.

      The standard SQL syntax is "GROUP relation BY dim1, dim2, dim3 WITH CUBE" – which in addition to all dim1-2-3, produces aggregations for just dim1, just dim1 and dim2, etc. NULL is generally used to represent "all".

      A presentation by Arnab Nandi describes how one might implement efficient cubing in Map-Reduce here: http://pdf.cx/44wrk

      We can start with the naive solution which only works for algebraic measures, and work up from there.

      This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012

      1. PIG-2167.1.patch
        57 kB
        Prasanth Jayachandran
      2. PIG-2167.2.patch
        58 kB
        Prasanth Jayachandran
      3. PIG-2167.3.patch
        59 kB
        Prasanth Jayachandran
      4. PIG-2167.4.patch
        62 kB
        Prasanth Jayachandran
      5. Pig-Cubing-Performance.png
        20 kB
        Prasanth Jayachandran

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Prasanth Jayachandran
            Reporter:
            Dmitriy V. Ryaboy
          • Votes:
            6 Vote for this issue
            Watchers:
            24 Start watching this issue

            Dates

            • Created:
              Updated:

              Development