Pig
  1. Pig
  2. PIG-232

Number of input/output rows in the logs is invalid with BinaryStorage

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.1.0
    • Component/s: None
    • Labels:
      None

      Description

      My pig script:

      define CMD `perl PigStreamingBad.pl end` ship('PigStreamingBad.pl') stderr('CMD' limit 1);
      A = load 'studenttab10k';
      B = stream A through CMD;
      store B into 'out';

      My perl script:

      use strict;

      1. This script is used to test streaming error cases in pig.
      2. Usage: PigStreaming.pl <start|middle|end>
      3. the parameter tells the application when to exit with error

      if ($#ARGV < 0)
      {
      print STDERR "Usage PigStreaming.pl <start|middle|end>\n";
      exit (-1);
      }

      my $pos = $ARGV[0];

      if ($pos eq "start")
      {
      print STDERR "Failed in the beginning of the processing\n";
      exit(1);
      }

      print STDERR "PigStreamingBad.pl: starting processing\n";

      my $cnt = 0;
      while (<STDIN>)
      {
      print "$_";
      $cnt++;
      print STDERR "PigStreaming.pl: processing $_\n";
      if (($cnt > 100) && ($pos eq "middle"))

      { print STDERR "Failed in the middle of processing\n"; exit(2); }

      }

      print STDERR "Failed at the end of processing\n";
      exit(3);

      1. PIG-232_0_20080507.patch
        0.9 kB
        Arun C Murthy
      2. PIG-232_1_20080507.patch
        0.9 kB
        Arun C Murthy
      3. PIG-232_2_20080508.patch
        7 kB
        Arun C Murthy

        Activity

        Hide
        Olga Natkovich added a comment -

        patch committed. thanks, arun

        Show
        Olga Natkovich added a comment - patch committed. thanks, arun
        Hide
        Arun C Murthy added a comment -

        Patch to take care of "#" in the cache-spec; also added a test-case for cache specs.

        Show
        Arun C Murthy added a comment - Patch to take care of "#" in the cache-spec; also added a test-case for cache specs.
        Hide
        Olga Natkovich added a comment -

        This patch broke cache statement - it know always claims that it is invalid. This is because #name is not stripped.

        Show
        Olga Natkovich added a comment - This patch broke cache statement - it know always claims that it is invalid. This is because #name is not stripped.
        Hide
        Olga Natkovich added a comment -

        second patch committed as well

        Show
        Olga Natkovich added a comment - second patch committed as well
        Hide
        Arun C Murthy added a comment -

        Patch to fix input-records as well...

        Show
        Arun C Murthy added a comment - Patch to fix input-records as well...
        Hide
        Olga Natkovich added a comment -

        I committed the patch. It fixes the number of the output rows. The same issue needs to be fixed for input rows as well

        Show
        Olga Natkovich added a comment - I committed the patch. It fixes the number of the output rows. The same issue needs to be fixed for input rows as well
        Hide
        Arun C Murthy added a comment -

        Better, simpler, smaller patch...

        Show
        Arun C Murthy added a comment - Better, simpler, smaller patch...
        Hide
        Arun C Murthy added a comment -

        Patch isn't correct, needs to be fixed.

        Show
        Arun C Murthy added a comment - Patch isn't correct, needs to be fixed.
        Hide
        Arun C Murthy added a comment -

        Patch to not show the #output-records when BinaryStorage is being used...

        Show
        Arun C Murthy added a comment - Patch to not show the #output-records when BinaryStorage is being used...
        Hide
        Olga Natkovich added a comment -

        Ok, when this happen, can we report "not known" or some such thing rather than giving an invalid value?

        Show
        Olga Natkovich added a comment - Ok, when this happen, can we report "not known" or some such thing rather than giving an invalid value?
        Hide
        Arun C Murthy added a comment - - edited

        Olga, this is due to the fact that the stream/store optimization is kicking in and hence only the 'binary tuples' are being reported... could you please try by switching off the optimization?

        /pig/studenttab10k has 10,000 records.

        Now:

        IP = load '/pig/studenttab10k';
        OP = stream IP through `perl -ne 'print $_;'`; 
        store OP into '/pig/out' using PigStorage(',');
        

        correctly shows 10,000 as the no. of output-records while:

        IP = load '/pig/studenttab10k';
        OP = stream IP through `perl -ne 'print $_;'`; 
        store OP into '/pig/out';
        

        shows the no. of output-records as 4 due to the stream/store optimization.

        Could you please re-check? Thanks!

        Show
        Arun C Murthy added a comment - - edited Olga, this is due to the fact that the stream/store optimization is kicking in and hence only the 'binary tuples' are being reported... could you please try by switching off the optimization? /pig/studenttab10k has 10,000 records. Now: IP = load '/pig/studenttab10k'; OP = stream IP through `perl -ne 'print $_;'`; store OP into '/pig/out' using PigStorage(','); correctly shows 10,000 as the no. of output-records while: IP = load '/pig/studenttab10k'; OP = stream IP through `perl -ne 'print $_;'`; store OP into '/pig/out'; shows the no. of output-records as 4 due to the stream/store optimization. Could you please re-check? Thanks!
        Hide
        Olga Natkovich added a comment -

        I see the same behavior with valid queries as well

        Show
        Olga Natkovich added a comment - I see the same behavior with valid queries as well

          People

          • Assignee:
            Arun C Murthy
            Reporter:
            Olga Natkovich
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development