Pig
  1. Pig
  2. PIG-2374

streaming regression with dotNext

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.2
    • Component/s: None
    • Labels:
    • Environment:
    • Hadoop Flags:
      Reviewed

      Description

      Streaming seems to be broken in dotNext. There are several tests that are failing.
      The results from C below produce clean results.
      The results from D which are streamed through CMD produce control characters on some of the output.

      define CMD `perl GroupBy.pl '\t' 0` ship('/homes/monster/pigtest/pigtest_next/pigharness/dist/pig_harness/libexec/PigTest/GroupBy.pl');
      A = load '/user/user1/pig/tests/data/singlefile/studenttab10k';
      B = group A by $0;
      C = foreach B generate flatten(A);
      D = stream C through CMD;
      store C into '/user/user1/pig/out/user1.1321117428/ComputeSpec_7_C.out';
      store D into '/user/user1/pig/out/user1.1321117428/ComputeSpec_7_D.out';

      Other streaming tests that fail with control characters:
      EST FAILED <ComputeSpec_7>
      TEST FAILED <ComputeSpec_8>
      TEST FAILED <ComputeSpec_10>
      TEST FAILED <ComputeSpec_11>
      TEST FAILED <ComputeSpec_12>
      TEST FAILED <JobManagement_2>
      TEST FAILED <JobManagement_3>
      TEST FAILED <StreamingIO_4>
      TEST FAILED <NonStreaming_1>
      TEST FAILED <MultiQuery_21>
      ...

        Issue Links

          Activity

          Hide
          Daniel Dai added a comment -

          This is caused by HADOOP-6109 (0.21 and beyond). After 6109, Text.getBytes() will return a bytearray larger than Text.length. In Pig code, OutputHandler:92, we use the bytearray and ignore length. We need to either:
          1. Ask Hadoop to rollback HADOOP-6109
          2. Hunting down all occurrence we use getBytes() but ignore length in Pig

          Show
          Daniel Dai added a comment - This is caused by HADOOP-6109 (0.21 and beyond). After 6109, Text.getBytes() will return a bytearray larger than Text.length. In Pig code, OutputHandler:92, we use the bytearray and ignore length. We need to either: 1. Ask Hadoop to rollback HADOOP-6109 2. Hunting down all occurrence we use getBytes() but ignore length in Pig
          Hide
          Daniel Dai added a comment -

          PIG-2374-1.patch use approach 2.

          Show
          Daniel Dai added a comment - PIG-2374 -1.patch use approach 2.
          Hide
          Daniel Dai added a comment -

          Unit tests pass. No tests included cuz the current e2e tests already have it covered.

          Show
          Daniel Dai added a comment - Unit tests pass. No tests included cuz the current e2e tests already have it covered.
          Hide
          Thejas M Nair added a comment -

          +1

          Show
          Thejas M Nair added a comment - +1
          Hide
          Daniel Dai added a comment -

          Unit tests pass. test-patch:
          [exec] -1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
          [exec] Please justify why no tests are needed for this patch.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] -1 release audit. The applied patch generated 463 release audit warnings (more than the trunk's current 456 warnings).

          No tests included since it is a regression. No new file added so ignore release audit warning.

          Patch committed to trunk/0.10/0.9

          Show
          Daniel Dai added a comment - Unit tests pass. test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 463 release audit warnings (more than the trunk's current 456 warnings). No tests included since it is a regression. No new file added so ignore release audit warning. Patch committed to trunk/0.10/0.9
          Hide
          Ashutosh Chauhan added a comment -

          We should push for backward compatibility of getBytes() on Hadoop for this. The way it is fixed with this patch will necessitate an extra buffer copy in Pig, an unnecessary performance hit.

          Show
          Ashutosh Chauhan added a comment - We should push for backward compatibility of getBytes() on Hadoop for this. The way it is fixed with this patch will necessitate an extra buffer copy in Pig, an unnecessary performance hit.
          Hide
          Daniel Dai added a comment -

          Yes, this is a break of contract and might hit other projects as well.

          Show
          Daniel Dai added a comment - Yes, this is a break of contract and might hit other projects as well.
          Hide
          Olga Natkovich added a comment -

          I think Ashutosh is brining a really good point. We seemed to always fixing things in Pig because understandably it is easier for us. However, if Hadoop is breaking contract they should be fixing this especially if we have to be paying performance penalty on this

          Show
          Olga Natkovich added a comment - I think Ashutosh is brining a really good point. We seemed to always fixing things in Pig because understandably it is easier for us. However, if Hadoop is breaking contract they should be fixing this especially if we have to be paying performance penalty on this

            People

            • Assignee:
              Daniel Dai
              Reporter:
              Araceli Henley
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development