Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Computing aggregates over a cube of several dimensions is a common operation in data warehousing.

      The standard SQL syntax is "GROUP relation BY dim1, dim2, dim3 WITH CUBE" – which in addition to all dim1-2-3, produces aggregations for just dim1, just dim1 and dim2, etc. NULL is generally used to represent "all".

      A presentation by Arnab Nandi describes how one might implement efficient cubing in Map-Reduce here: http://pdf.cx/44wrk

      We can start with the naive solution which only works for algebraic measures, and work up from there.

      This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012

        Attachments

        1. Pig-Cubing-Performance.png
          20 kB
          Prasanth Jayachandran
        2. PIG-2167.4.patch
          62 kB
          Prasanth Jayachandran
        3. PIG-2167.3.patch
          59 kB
          Prasanth Jayachandran
        4. PIG-2167.2.patch
          58 kB
          Prasanth Jayachandran
        5. PIG-2167.1.patch
          57 kB
          Prasanth Jayachandran

          Activity

            People

            • Assignee:
              prasanth_j Prasanth Jayachandran
              Reporter:
              dvryaboy Dmitriy V. Ryaboy
            • Votes:
              6 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

              • Created:
                Updated: