Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Computing aggregates over a cube of several dimensions is a common operation in data warehousing.

      The standard SQL syntax is "GROUP relation BY dim1, dim2, dim3 WITH CUBE" – which in addition to all dim1-2-3, produces aggregations for just dim1, just dim1 and dim2, etc. NULL is generally used to represent "all".

      A presentation by Arnab Nandi describes how one might implement efficient cubing in Map-Reduce here: http://pdf.cx/44wrk

      We can start with the naive solution which only works for algebraic measures, and work up from there.

      This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012

      1. Pig-Cubing-Performance.png
        20 kB
        Prasanth J
      2. PIG-2167.1.patch
        57 kB
        Prasanth J
      3. PIG-2167.2.patch
        58 kB
        Prasanth J
      4. PIG-2167.3.patch
        59 kB
        Prasanth J
      5. PIG-2167.4.patch
        62 kB
        Prasanth J

        Activity

        Dmitriy V. Ryaboy created issue -
        Olga Natkovich made changes -
        Field Original Value New Value
        Fix Version/s 0.10 [ 12316246 ]
        Daniel Dai made changes -
        Labels gsoc2012
        Daniel Dai made changes -
        Description Computing aggregates over a cube of several dimensions is a common operation in data warehousing.

        The standard SQL syntax is "GROUP relation BY dim1, dim2, dim3 WITH CUBE" -- which in addition to all dim1-2-3, produces aggregations for just dim1, just dim1 and dim2, etc. NULL is generally used to represent "all".

        A presentation by Arnab Nandi describes how one might implement efficient cubing in Map-Reduce here: http://pdf.cx/44wrk

        We can start with the naive solution which only works for algebraic measures, and work up from there.
        Computing aggregates over a cube of several dimensions is a common operation in data warehousing.

        The standard SQL syntax is "GROUP relation BY dim1, dim2, dim3 WITH CUBE" -- which in addition to all dim1-2-3, produces aggregations for just dim1, just dim1 and dim2, etc. NULL is generally used to represent "all".

        A presentation by Arnab Nandi describes how one might implement efficient cubing in Map-Reduce here: http://pdf.cx/44wrk

        We can start with the naive solution which only works for algebraic measures, and work up from there.

        This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012
        Prasanth J made changes -
        Attachment Pig-Cubing-Performance.png [ 12519368 ]
        Prasanth J made changes -
        Attachment PIG-2167.1.patch [ 12519373 ]
        Dmitriy V. Ryaboy made changes -
        Labels gsoc2012 gsoc2012 mentor
        Dmitriy V. Ryaboy made changes -
        Assignee Dmitriy V. Ryaboy [ dvryaboy ]
        Prasanth J made changes -
        Attachment PIG-2167.2.patch [ 12522382 ]
        Prasanth J made changes -
        Attachment PIG-2167.3.patch [ 12525156 ]
        Prasanth J made changes -
        Attachment PIG-2167.4.patch [ 12525754 ]
        Dmitriy V. Ryaboy made changes -
        Assignee Dmitriy V. Ryaboy [ dvryaboy ] Prasanth J [ prasanth_j ]

          People

          • Assignee:
            Prasanth J
            Reporter:
            Dmitriy V. Ryaboy
          • Votes:
            6 Vote for this issue
            Watchers:
            24 Start watching this issue

            Dates

            • Created:
              Updated:

              Development