Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-580

PERFORMANCE: Combiner should also be used when there are distinct aggregates in a foreach following a group provided there are no non-algebraics in the foreach

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.2.0
    • 0.2.0
    • None
    • None
    • Patch Available

    Description

      Currently Pig uses the combiner only when there is foreach following a group when the elements in the foreach generate have the following characteristics:
      1) simple project of the "group" column
      2) Algebraic UDF

      The above conditions exclude use of the combiner for distinct aggregates - the distinct operation itself is combinable (irrespective of whether it feeds to an algebraic or non algebraic udf). So if the following foreach should also be combinable:

      ..
      b = group a by $0;
      c = foreach b generate { x = distinct a; generate group, COUNT(x), SUM(x.$1) }
      

      The combiner optimizer should cause the distinct to be combined and the final combine output should feed the COUNT() and SUM() in the reduce.

      Attachments

        1. PIG-580.patch
          47 kB
          Pradeep Kamath
        2. PIG-580-v2.patch
          47 kB
          Pradeep Kamath

        Activity

          People

            pkamath Pradeep Kamath
            pkamath Pradeep Kamath
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: