Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
data_group = GROUP A BY (f1, f2) PARALLEL 100; group_result = FOREACH data_group { B = LIMIT A.f3 1; GENERATE group, SUM(A.f3), SUM(A.f4), SUM(A.f5), SUM(A.f6),FLATTEN(B); };
A script like this has combiner optimization turned off and so consumes a lot of memory and is slow. We should implement LIMIT using Combiner in cases like this.