Pig
  1. Pig
  2. PIG-2610

GC errors on using FILTER within nested FOREACH

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.9.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      User has reported running into GC overhead errors while trying to use FILTER within FOREACH and aggregating the filtered field. Here is the sample PigLatin script provided by the user that generated this issue.

      raw = LOAD 'input' using MyCustomLoader();
      
      searches = FOREACH raw GENERATE
                     day, searchType,
                     FLATTEN(impBag) AS (adType, clickCount)
                 ;
      
      groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
      counts = FOREACH groupedSearches{
                     type1 = FILTER searches BY adType == 'type1';
                     type2 = FILTER searches BY adType == 'type2';
                     GENERATE
                         FLATTEN(group) AS (day, searchType),
                         COUNT(searches) numSearches,
                         SUM(clickCount) AS clickCountPerSearchType,
                         SUM(type1.clickCount) AS type1ClickCount,
                         SUM(type2.clickCount) AS type2ClickCount;
             };
      

      Pig should be able to handle this case.

        Activity

        Prashant Kommireddi created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Prashant Kommireddi
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development