Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.10.1
-
None
-
None
-
None
Description
The CombinerOptimizer does not operate on the script below. As a result, all work is done in the reducer(s), killing performance. Removing one STORE or refactoring the query to use a single FOREACH after the group allows the CombinerOptimizer to work.
%declare DUMMY `bash -c '(for (( i=0; \$i < 10; i++ )); do echo \$i 5; done) | hadoop fs -put - /tmp/test_data.tsv; true'` s = LOAD '/tmp/test_data.tsv' USING PigStorage(' ') AS (n:long, g:long); grouped = GROUP s BY g; counted = FOREACH grouped GENERATE flatten($0), COUNT_STAR($1); STORE counted INTO '/tmp/test_count'; summed = FOREACH grouped GENERATE flatten($0), SUM($1.n); STORE summed INTO '/tmp/test_sum'; FS -rmr /tmp/test_{data.tsv,count,sum}