Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
As reported by Scott Carey in PIG-479, combiner does not get used for co-group, even if the functions applied on the bags are algebraic . -
Quoting from the comment -
"For example, I'm not quite sure why this one doesn't use a combiner - it reads ~350x as much input bytes from HDFS as its reduce output, a combiner would be very effective:
J = COGROUP
UV BY (s, d, h, g, p, pa, st) OUTER,
UC BY (s, d, h, g, p, pa, st) OUTER,
AT BY (s, d, h, g, p, pa, st) OUTER,
V BY (s, d, h, g, p, pa, st) OUTER,
C BY (s, d, h, g, p, pa, st) OUTER;
OUTPUT = FOREACH J GENERATE
FLATTEN(group) as (s, d, h, g, p, pa, st),
COUNT_STAR(C) as c,
COUNT_STAR(V) as v,
SUM(AT.p1) as p1,
SUM(AT.p2) as p2,
SUM(AT.p3) as p3,
SUM(UC.q) as ucq,
SUM(UC.r) as ucr,
SUM(UV.q) as uvq,
SUM(UV.r) as uvr;
"