With changes in the patch, queries which have algebraic functions within expressions also will use combiner. This is as long as the bags from group-by are only input for algebraic expressions. If bag is projected or a non algebraic expression/udf has bag as input, combiner will not be used.
Combiner will be used in case of following foreach statements (that follow group) -
describe B ;
B: {group: int, A: {c1 : int, c2 : int, c3 : int}}
1) foreach B generate SUM(A.c2) * AVG(A.c3), ...
2) foreach B generate 1 / SUM(A.c2)
3) foreach B generate EXP(AVG(A.c2))
4) foreach B generate group + SUM(A.c2)
Following statements will not use combiner -
1) foreach B generate A.c2, ...
2) foreach B generate EXP(c2) , SUM(c2) ... - Where EXP is non algebraic function
In case of nested foreach statement, if it has limit, order, or filter , combiner does not get used (as before).
This patch also fixes
PIG-490, foreach statements that access group elements also use combiner
for example -
1) foreach B generate group.$0, group.$1, COUNT(A);
1) foreach B generate group.c1, group.c2, COUNT(A);