Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.9.1
-
None
-
None
-
None
Description
User has reported running into GC overhead errors while trying to use FILTER within FOREACH and aggregating the filtered field. Here is the sample PigLatin script provided by the user that generated this issue.
raw = LOAD 'input' using MyCustomLoader(); searches = FOREACH raw GENERATE day, searchType, FLATTEN(impBag) AS (adType, clickCount) ; groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50; counts = FOREACH groupedSearches{ type1 = FILTER searches BY adType == 'type1'; type2 = FILTER searches BY adType == 'type2'; GENERATE FLATTEN(group) AS (day, searchType), COUNT(searches) numSearches, SUM(clickCount) AS clickCountPerSearchType, SUM(type1.clickCount) AS type1ClickCount, SUM(type2.clickCount) AS type2ClickCount; };
Pig should be able to handle this case.