Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-514

COUNT returns no results as a result of two filter statements in FOREACH

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.2.0
    • 0.3.0
    • impl
    • None
    • Reviewed

    Description

      For the following piece of sample code in FOREACH which counts the filtered student records based on record_type == 1 and scores and also on record_type == 0 does not seem to return any results.

      mydata = LOAD 'mystudentfile.txt' AS  (record_type,name,age,scores,gpa);
      --keep only what we need
      mydata_filtered = FOREACH  mydata GENERATE   record_type,  name,  age,  scores ;
      --group
      mydata_grouped = GROUP mydata_filtered BY  (record_type,age);
      
      myfinaldata = FOREACH mydata_grouped {
           myfilter1 = FILTER mydata_filtered BY record_type == 1 AND age == scores;
           myfilter2 = FILTER mydata_filtered BY record_type == 0;
           GENERATE FLATTEN(group),
      -- Only this count causes the problem ??
            COUNT(myfilter1) as col2,
            SUM(myfilter2.scores) as col3,
            COUNT(myfilter2) as col4;  };
      
      --these set of statements confirm that the count on the  filters returns 1
      --mycountdata = FOREACH mydata_grouped
      --{
      --      myfilter1 = FILTER mydata_filtered BY record_type == 1 AND age == scores;
      --      GENERATE
      --      COUNT(myfilter1) as colcount;
      --};
      --dump mycountdata;
      
      dump myfinaldata;
      

      But if you uncomment the

       COUNT(myfilter1) as col2, 

      , it seems to work with the following results..
      (0,22,45.0,2L)
      (0,24,133.0,6L)
      (0,25,22.0,1L)

      Also I have tried to verify if this is a issue with the

       COUNT(myfilter1) as col2, 

      returning zero. It does not seem to be the case.
      If

        dump mycountdata; 

      is uncommented it returns:
      (1L)
      (1L)

      I am attaching the tab separated 'mystudentfile.txt' file used in this Pig script. Is this an issue with 2 filters in the FOREACH followed by a COUNT on these filters??

      Attachments

        1. mystudentfile.txt
          0.2 kB
          Viraj Bhat
        2. PIG-514.patch
          98 kB
          Pradeep Kamath

        Issue Links

          Activity

            People

              pkamath Pradeep Kamath
              viraj Viraj Bhat
              Votes:
              6 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: