Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-710

Filtering bag in nested foreach does not produce expected results

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 0.3.0
    • Component/s: None
    • Labels:
      None

      Description

      I have an idiom I used to use in older versions of pig (prior to types branch) which would group into a collection and then filter the output if any of the collection contained a particular string.

      This relies on FILTER statements within a FOREACH ...

      { ... GENERATE ... } statement.

      ORDER ... BY in the FOREACH ... { ... GENERATE ... }

      statement does not seem to have a problem so it seems to be something isolated to the FILTER.

      A = load 'filterbug.data' using PigStorage() as ( id, str );
      
      B = group A by ( id );
      describe B;
      dump B;
      
      D = foreach B generate
              group,
              COUNT(A),
              A.str;
      describe D;
      dump D;
      
      C = foreach B {
              D = order A by str;
              matchedcount = COUNT(D);
              generate
                      group,
                      matchedcount as matchedcount,
                      D.str;
              };
      describe C;
      dump C;
      
      Cfiltered = foreach B {
              D = filter A by (
                      str matches 'hello'
                      );
              matchedcount = COUNT(D);
              generate
                      group,
                      matchedcount as matchedcount,
                      A.str;
              };
      describe Cfiltered;
      dump Cfiltered;
      

      Here's the output:

      -bash-3.00$ pig -exectype local -latest filterbug.pig
      USING: /grid/0/gs/pig/current
      
      B: {group: bytearray,A: {id: bytearray,str: bytearray}}
      2009-03-10 03:14:14,838 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
      2009-03-10 03:14:14,839 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
      (a,{(a,hello),(a,goodbye)})
      (b,{(b,goodbye)})
      (c,{(c,hello),(c,hello),(c,hello)})
      (d,{(d,what)})
      
      D: {group: bytearray,long,str: {str: bytearray}}
      2009-03-10 03:14:14,920 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
      2009-03-10 03:14:14,920 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
      (a,2L,{(hello),(goodbye)})
      (b,1L,{(goodbye)})
      (c,3L,{(hello),(hello),(hello)})
      (d,1L,{(what)})
      
      C: {group: bytearray,matchedcount: long,str: {str: bytearray}}
      2009-03-10 03:14:14,985 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
      2009-03-10 03:14:14,985 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
      (a,2L,{(goodbye),(hello)})
      (b,1L,{(goodbye)})
      (c,3L,{(hello),(hello),(hello)})
      (d,1L,{(what)})
      2009-03-10 03:14:15,018 [main] WARN  org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
      
      Cfiltered: {group: bytearray,matchedcount: long,str: {str: bytearray}}
      2009-03-10 03:14:15,044 [main] WARN  org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
      2009-03-10 03:14:15,057 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
      2009-03-10 03:14:15,057 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
      (a,1L,{(hello),(goodbye)})
      

      What I expect for the output of Cfiltered is actually:

      (a,1L,

      {(hello),(goodbye)}

      )
      (b,0L,

      {(goodbye)}

      )
      (c,3L,

      {(hello),(hello),(hello)}

      )
      (d,0L,

      {(what)}

      )

      The data file is:

      a       hello
      a       goodbye
      b       goodbye
      c       hello
      c       hello
      c       hello
      d       what
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                pkamath Pradeep Kamath
                Reporter:
                ciemo David Ciemiewicz
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: