Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4843

Turn off combiner in reducer vertex for Tez if bags are in combine plan

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.16.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      B = group A by key;
      C = foreach B {
                                               key_value           =  A.key_value;
                                               distinct_key_value  = DISTINCT key_value;
                                               generate group, MIN(A.key_value) as min_value, MAX(A.key_value) as max_value, COUNT(distinct_key_value) as distinct_values;
                          }
      

      In the above example, the combine plan holds the Distinct bag and it causes OOM when combiner is run by the MergeManager in the reducer. We did not have this issue with mapreduce as combiner is not running in reducer for new API till now (MAPREDUCE-5221)

        Attachments

        1. PIG-4843-1.patch
          9 kB
          Rohini Palaniswamy

          Activity

            People

            • Assignee:
              rohini Rohini Palaniswamy
              Reporter:
              rohini Rohini Palaniswamy

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment