Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
B = group A by key; C = foreach B { key_value = A.key_value; distinct_key_value = DISTINCT key_value; generate group, MIN(A.key_value) as min_value, MAX(A.key_value) as max_value, COUNT(distinct_key_value) as distinct_values; }
In the above example, the combine plan holds the Distinct bag and it causes OOM when combiner is run by the MergeManager in the reducer. We did not have this issue with mapreduce as combiner is not running in reducer for new API till now (MAPREDUCE-5221)