• Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:


      Computing aggregates over a cube of several dimensions is a common operation in data warehousing.

      The standard SQL syntax is "GROUP relation BY dim1, dim2, dim3 WITH CUBE" – which in addition to all dim1-2-3, produces aggregations for just dim1, just dim1 and dim2, etc. NULL is generally used to represent "all".

      A presentation by Arnab Nandi describes how one might implement efficient cubing in Map-Reduce here:

      We can start with the naive solution which only works for algebraic measures, and work up from there.

      This is a candidate project for Google summer of code 2012. More information about the program can be found at

      1. Pig-Cubing-Performance.png
        20 kB
        Prasanth Jayachandran
      2. PIG-2167.4.patch
        62 kB
        Prasanth Jayachandran
      3. PIG-2167.3.patch
        59 kB
        Prasanth Jayachandran
      4. PIG-2167.2.patch
        58 kB
        Prasanth Jayachandran
      5. PIG-2167.1.patch
        57 kB
        Prasanth Jayachandran


        No work has yet been logged on this issue.


          • Assignee:
            Prasanth Jayachandran
            Dmitriy V. Ryaboy
          • Votes:
            6 Vote for this issue
            24 Start watching this issue


            • Created: