Pig
  1. Pig
  2. PIG-232

Number of input/output rows in the logs is invalid with BinaryStorage

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.1.0
    • Component/s: None
    • Labels:
      None

      Description

      My pig script:

      define CMD `perl PigStreamingBad.pl end` ship('PigStreamingBad.pl') stderr('CMD' limit 1);
      A = load 'studenttab10k';
      B = stream A through CMD;
      store B into 'out';

      My perl script:

      use strict;

      1. This script is used to test streaming error cases in pig.
      2. Usage: PigStreaming.pl <start|middle|end>
      3. the parameter tells the application when to exit with error

      if ($#ARGV < 0)
      {
      print STDERR "Usage PigStreaming.pl <start|middle|end>\n";
      exit (-1);
      }

      my $pos = $ARGV[0];

      if ($pos eq "start")
      {
      print STDERR "Failed in the beginning of the processing\n";
      exit(1);
      }

      print STDERR "PigStreamingBad.pl: starting processing\n";

      my $cnt = 0;
      while (<STDIN>)
      {
      print "$_";
      $cnt++;
      print STDERR "PigStreaming.pl: processing $_\n";
      if (($cnt > 100) && ($pos eq "middle"))

      { print STDERR "Failed in the middle of processing\n"; exit(2); }

      }

      print STDERR "Failed at the end of processing\n";
      exit(3);

      1. PIG-232_2_20080508.patch
        7 kB
        Arun C Murthy
      2. PIG-232_1_20080507.patch
        0.9 kB
        Arun C Murthy
      3. PIG-232_0_20080507.patch
        0.9 kB
        Arun C Murthy

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        2h 30m 1 Arun C Murthy 08/May/08 00:21
        Open Open Patch Available Patch Available
        20h 18m 2 Arun C Murthy 08/May/08 00:22
        Resolved Resolved Reopened Reopened
        22h 3m 1 Olga Natkovich 09/May/08 00:08
        Reopened Reopened Patch Available Patch Available
        1h 23m 1 Arun C Murthy 09/May/08 01:31
        Patch Available Patch Available Resolved Resolved
        17h 29m 2 Olga Natkovich 09/May/08 17:19
        Resolved Resolved Closed Closed
        684d 5h 42m 1 Alan Gates 24/Mar/10 22:01
        Alan Gates made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Olga Natkovich made changes -
        Fix Version/s 0.1.0 [ 12312848 ]
        Olga Natkovich made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Olga Natkovich added a comment -

        patch committed. thanks, arun

        Show
        Olga Natkovich added a comment - patch committed. thanks, arun
        Arun C Murthy made changes -
        Status Reopened [ 4 ] Patch Available [ 10002 ]
        Arun C Murthy made changes -
        Attachment PIG-232_2_20080508.patch [ 12381729 ]
        Hide
        Arun C Murthy added a comment -

        Patch to take care of "#" in the cache-spec; also added a test-case for cache specs.

        Show
        Arun C Murthy added a comment - Patch to take care of "#" in the cache-spec; also added a test-case for cache specs.
        Olga Natkovich made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Hide
        Olga Natkovich added a comment -

        This patch broke cache statement - it know always claims that it is invalid. This is because #name is not stripped.

        Show
        Olga Natkovich added a comment - This patch broke cache statement - it know always claims that it is invalid. This is because #name is not stripped.
        Olga Natkovich made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Olga Natkovich added a comment -

        second patch committed as well

        Show
        Olga Natkovich added a comment - second patch committed as well
        Arun C Murthy made changes -
        Attachment PIG-232_1_20080507.patch [ 12381649 ]
        Hide
        Arun C Murthy added a comment -

        Patch to fix input-records as well...

        Show
        Arun C Murthy added a comment - Patch to fix input-records as well...
        Olga Natkovich made changes -
        Summary Number of output rows in the log seems to be invalid Number of input/output rows in the logs is invalid with BinaryStorage
        Hide
        Olga Natkovich added a comment -

        I committed the patch. It fixes the number of the output rows. The same issue needs to be fixed for input rows as well

        Show
        Olga Natkovich added a comment - I committed the patch. It fixes the number of the output rows. The same issue needs to be fixed for input rows as well
        Arun C Murthy made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Arun C Murthy made changes -
        Attachment PIG-232_0_20080507.patch [ 12381644 ]
        Hide
        Arun C Murthy added a comment -

        Better, simpler, smaller patch...

        Show
        Arun C Murthy added a comment - Better, simpler, smaller patch...
        Arun C Murthy made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Arun C Murthy added a comment -

        Patch isn't correct, needs to be fixed.

        Show
        Arun C Murthy added a comment - Patch isn't correct, needs to be fixed.
        Arun C Murthy made changes -
        Attachment PIG-232_0_20080507.patch [ 12381630 ]
        Arun C Murthy made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Arun C Murthy made changes -
        Attachment PIG-232_0_20080507.patch [ 12381630 ]
        Hide
        Arun C Murthy added a comment -

        Patch to not show the #output-records when BinaryStorage is being used...

        Show
        Arun C Murthy added a comment - Patch to not show the #output-records when BinaryStorage is being used...
        Hide
        Olga Natkovich added a comment -

        Ok, when this happen, can we report "not known" or some such thing rather than giving an invalid value?

        Show
        Olga Natkovich added a comment - Ok, when this happen, can we report "not known" or some such thing rather than giving an invalid value?
        Hide
        Arun C Murthy added a comment - - edited

        Olga, this is due to the fact that the stream/store optimization is kicking in and hence only the 'binary tuples' are being reported... could you please try by switching off the optimization?

        /pig/studenttab10k has 10,000 records.

        Now:

        IP = load '/pig/studenttab10k';
        OP = stream IP through `perl -ne 'print $_;'`; 
        store OP into '/pig/out' using PigStorage(',');
        

        correctly shows 10,000 as the no. of output-records while:

        IP = load '/pig/studenttab10k';
        OP = stream IP through `perl -ne 'print $_;'`; 
        store OP into '/pig/out';
        

        shows the no. of output-records as 4 due to the stream/store optimization.

        Could you please re-check? Thanks!

        Show
        Arun C Murthy added a comment - - edited Olga, this is due to the fact that the stream/store optimization is kicking in and hence only the 'binary tuples' are being reported... could you please try by switching off the optimization? /pig/studenttab10k has 10,000 records. Now: IP = load '/pig/studenttab10k'; OP = stream IP through `perl -ne 'print $_;'`; store OP into '/pig/out' using PigStorage(','); correctly shows 10,000 as the no. of output-records while: IP = load '/pig/studenttab10k'; OP = stream IP through `perl -ne 'print $_;'`; store OP into '/pig/out'; shows the no. of output-records as 4 due to the stream/store optimization. Could you please re-check? Thanks!
        Olga Natkovich made changes -
        Field Original Value New Value
        Summary Number of outpit rows in the log seems to be invalid when streaming application fails Number of output rows in the log seems to be invalid
        Hide
        Olga Natkovich added a comment -

        I see the same behavior with valid queries as well

        Show
        Olga Natkovich added a comment - I see the same behavior with valid queries as well
        Olga Natkovich created issue -

          People

          • Assignee:
            Arun C Murthy
            Reporter:
            Olga Natkovich
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development