Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2167

CUBE operation in Pig

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Computing aggregates over a cube of several dimensions is a common operation in data warehousing.

      The standard SQL syntax is "GROUP relation BY dim1, dim2, dim3 WITH CUBE" – which in addition to all dim1-2-3, produces aggregations for just dim1, just dim1 and dim2, etc. NULL is generally used to represent "all".

      A presentation by Arnab Nandi describes how one might implement efficient cubing in Map-Reduce here: http://pdf.cx/44wrk

      We can start with the naive solution which only works for algebraic measures, and work up from there.

      This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012

      Attachments

        1. PIG-2167.4.patch
          62 kB
          Prasanth Jayachandran
        2. PIG-2167.3.patch
          59 kB
          Prasanth Jayachandran
        3. PIG-2167.2.patch
          58 kB
          Prasanth Jayachandran
        4. PIG-2167.1.patch
          57 kB
          Prasanth Jayachandran
        5. Pig-Cubing-Performance.png
          20 kB
          Prasanth Jayachandran

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            prasanth_j Prasanth Jayachandran
            dvryaboy Dmitriy V. Ryaboy

            Dates

              Created:
              Updated:

              Issue deployment