Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2610

GC errors on using FILTER within nested FOREACH

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.9.1
    • None
    • None
    • None

    Description

      User has reported running into GC overhead errors while trying to use FILTER within FOREACH and aggregating the filtered field. Here is the sample PigLatin script provided by the user that generated this issue.

      raw = LOAD 'input' using MyCustomLoader();
      
      searches = FOREACH raw GENERATE
                     day, searchType,
                     FLATTEN(impBag) AS (adType, clickCount)
                 ;
      
      groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50;
      counts = FOREACH groupedSearches{
                     type1 = FILTER searches BY adType == 'type1';
                     type2 = FILTER searches BY adType == 'type2';
                     GENERATE
                         FLATTEN(group) AS (day, searchType),
                         COUNT(searches) numSearches,
                         SUM(clickCount) AS clickCountPerSearchType,
                         SUM(type1.clickCount) AS type1ClickCount,
                         SUM(type2.clickCount) AS type2ClickCount;
             };
      

      Pig should be able to handle this case.

      Attachments

        Activity

          People

            Unassigned Unassigned
            prkommireddi Prashant Kommireddi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: