Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4843

Turn off combiner in reducer vertex for Tez if bags are in combine plan

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.16.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      B = group A by key;
      C = foreach B {
                                               key_value           =  A.key_value;
                                               distinct_key_value  = DISTINCT key_value;
                                               generate group, MIN(A.key_value) as min_value, MAX(A.key_value) as max_value, COUNT(distinct_key_value) as distinct_values;
                          }
      

      In the above example, the combine plan holds the Distinct bag and it causes OOM when combiner is run by the MergeManager in the reducer. We did not have this issue with mapreduce as combiner is not running in reducer for new API till now (MAPREDUCE-5221)

        Attachments

        1. PIG-4843-1.patch
          9 kB
          Rohini Palaniswamy

          Activity

            People

            • Assignee:
              rohini Rohini Palaniswamy
              Reporter:
              rohini Rohini Palaniswamy
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: