Pig
  1. Pig
  2. PIG-2534

Pig generating infinite map outputs

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.1
    • Fix Version/s: 0.10.0, 0.9.3, 0.11
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I am getting a strange behavior by Pig in the below script for Pig 0.9.

      event_serve = LOAD 'input1'   AS (s, m, l);
      cm_data_raw = LOAD 'input2'  AS (s, m, l);
      
      SPLIT cm_data_raw INTO
          cm_serve_raw IF (( (chararray) (s#'key1') == '0') AND ( (chararray) (s#'key2') == '5')),
          cm_click_raw IF (( (chararray) (s#'key1') == '1') AND ( (chararray) (s#'key2') == '5'));
      
      cm_serve = FOREACH cm_serve_raw GENERATE  s#'key3' AS f1,  s#'key4' AS f2, s#'key5' AS f3 ;
      cm_serve_lowercase = stream cm_serve through `echo val3`;
      
      cm_serve_final = FOREACH cm_serve_lowercase GENERATE  $0 AS cm_event_guid, $1 AS cm_receive_time, $2 AS cm_ctx_url;
      
      event_serve_filtered = FILTER event_serve BY  (chararray) (s#'key1') neq 'xxx' AND (chararray) (s#'key2') neq 'yyy' ;
      
      event_serve_project = FOREACH event_serve_filtered GENERATE  s#'key3' AS event_guid, s#'key4' AS receive_time;
      
      event_serve_join = join cm_serve_final by (cm_event_guid),
          event_serve_project by (event_guid);
      
      store event_serve_join into 'somewhere';
      

      Input (both input1 and input2 is same)

      key1#0,key2#5,key3#val3,key4#val4,key5#val5

      If i run this pig script with ColumnMapKeyPrune disabled, the job goes through fine and 1 output is created.
      But if I run this script by default, then it keeps on generating map output records infinitely.

      1. PIG-2534-2.patch
        4 kB
        Daniel Dai
      2. PIG-2534-1.patch
        3 kB
        Daniel Dai

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          14d 11h 25m 1 Daniel Dai 01/Mar/12 07:22
          Resolved Resolved Closed Closed
          56d 13h 10m 1 Daniel Dai 26/Apr/12 20:33
          Aniket Mokashi made changes -
          Link This issue relates to PIG-2566 [ PIG-2566 ]
          Daniel Dai made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Daniel Dai added a comment -

          Open PIG-2566 to track Dmitriy's comment.

          Show
          Daniel Dai added a comment - Open PIG-2566 to track Dmitriy's comment.
          Daniel Dai made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Assignee Daniel Dai [ daijy ]
          Fix Version/s 0.10 [ 12316246 ]
          Fix Version/s 0.9.3 [ 12319456 ]
          Fix Version/s 0.11 [ 12318878 ]
          Resolution Fixed [ 1 ]
          Hide
          Daniel Dai added a comment -

          Unit test pass. test-patch:
          [exec] -1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] -1 release audit. The applied patch generated 533 release audit warnings (more than the trunk's current 530 warnings).

          javadoc and release audit warning is unrelated.

          Patch committed to 0.9/0.10/trunk

          Show
          Daniel Dai added a comment - Unit test pass. test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 533 release audit warnings (more than the trunk's current 530 warnings). javadoc and release audit warning is unrelated. Patch committed to 0.9/0.10/trunk
          Daniel Dai made changes -
          Attachment PIG-2534-2.patch [ 12516650 ]
          Hide
          Daniel Dai added a comment -

          Attach PIG-2534-2.patch to add comments to setOutputUids, also fix a findbug warning. For refactory, we can certainly do it, but since that is lower priority, I will open a separate ticket to address it.

          Show
          Daniel Dai added a comment - Attach PIG-2534 -2.patch to add comments to setOutputUids, also fix a findbug warning. For refactory, we can certainly do it, but since that is lower priority, I will open a separate ticket to address it.
          Hide
          Dmitriy V. Ryaboy added a comment -

          It throws an exception as part of normal flow control? Can we refactor that?

          Show
          Dmitriy V. Ryaboy added a comment - It throws an exception as part of normal flow control? Can we refactor that?
          Hide
          Thejas M Nair added a comment -

          +1
          I think we should add a javadoc comment to the setOutputUids function, saying that it checks for null schema and throws an exception to stop column pruning from happening. We should also consider moving such checks (such as null schema) that need to happen on all logical operators to a separate visitor (subclass of a AllLogicalExpressionVisitor?), which gets called from the Transformer.check().

          Show
          Thejas M Nair added a comment - +1 I think we should add a javadoc comment to the setOutputUids function, saying that it checks for null schema and throws an exception to stop column pruning from happening. We should also consider moving such checks (such as null schema) that need to happen on all logical operators to a separate visitor (subclass of a AllLogicalExpressionVisitor?), which gets called from the Transformer.check().
          Daniel Dai made changes -
          Field Original Value New Value
          Attachment PIG-2534-1.patch [ 12515312 ]
          Hide
          Daniel Dai added a comment -

          It happens when stream does not have an output schema. A workaround is to change
          stream cm_serve through `tr [:upper:] [:lower:]`
          into
          stream cm_serve through `tr [:upper:] [:lower:]` as (a, b, c)

          I will upload a patch shortly.

          Show
          Daniel Dai added a comment - It happens when stream does not have an output schema. A workaround is to change stream cm_serve through `tr [:upper:] [:lower:] ` into stream cm_serve through `tr [:upper:] [:lower:] ` as (a, b, c) I will upload a patch shortly.
          Hide
          Vivek Padmanabhan added a comment -

          Posting a test case without SPLIT to reproduce this issue

          @Test
              public void testINFINITE() throws Exception {
          
                  File f1 = Util.createFile(new String [] {"[key1#0,key2#5,key3#val3,key4#val4,key5#val5]"} );
                  File f2 = Util.createFile(new String [] {"[key1#0,key2#5,key3#val3,key4#val4,key5#val5]"} );
          
                  PigServer ps = new PigServer(ExecType.LOCAL);
                  ps.registerQuery("event_serve = LOAD 'file://"+f1.getAbsolutePath()+"' AS (s, m, l);");
                  ps.registerQuery("cm_data_raw = LOAD 'file://"+f2.getAbsolutePath()+"' AS (s, m, l);");
                  ps.registerQuery("cm_serve = FOREACH cm_data_raw GENERATE  s#'key3' AS f1,  s#'key4' AS f2, s#'key5' AS f3 ;");
                  ps.registerQuery("cm_serve_lowercase = stream cm_serve through `tr [:upper:] [:lower:]`  ;");
                  ps.registerQuery("cm_serve_final = FOREACH cm_serve_lowercase GENERATE  $0 AS cm_event_guid, $1 AS cm_receive_time, $2 AS cm_ctx_url;");
                  ps.registerQuery("event_serve_project = FOREACH  event_serve GENERATE  s#'key3' AS event_guid, s#'key4' AS receive_time;");
                  ps.registerQuery("event_serve_join = join cm_serve_final by (cm_event_guid), event_serve_project by (event_guid);");
          
                  Iterator<Tuple> itr = ps.openIterator("event_serve_join");
                  while(itr.hasNext())
                      System.out.println(itr.next());
          
              }
          
          Show
          Vivek Padmanabhan added a comment - Posting a test case without SPLIT to reproduce this issue @Test public void testINFINITE() throws Exception { File f1 = Util.createFile( new String [] { "[key1#0,key2#5,key3#val3,key4#val4,key5#val5]" } ); File f2 = Util.createFile( new String [] { "[key1#0,key2#5,key3#val3,key4#val4,key5#val5]" } ); PigServer ps = new PigServer(ExecType.LOCAL); ps.registerQuery( "event_serve = LOAD 'file: //" +f1.getAbsolutePath()+ "' AS (s, m, l);" ); ps.registerQuery( "cm_data_raw = LOAD 'file: //" +f2.getAbsolutePath()+ "' AS (s, m, l);" ); ps.registerQuery( "cm_serve = FOREACH cm_data_raw GENERATE s#'key3' AS f1, s#'key4' AS f2, s#'key5' AS f3 ;" ); ps.registerQuery( "cm_serve_lowercase = stream cm_serve through `tr [:upper:] [:lower:]` ;" ); ps.registerQuery( "cm_serve_final = FOREACH cm_serve_lowercase GENERATE $0 AS cm_event_guid, $1 AS cm_receive_time, $2 AS cm_ctx_url;" ); ps.registerQuery( "event_serve_project = FOREACH event_serve GENERATE s#'key3' AS event_guid, s#'key4' AS receive_time;" ); ps.registerQuery( "event_serve_join = join cm_serve_final by (cm_event_guid), event_serve_project by (event_guid);" ); Iterator<Tuple> itr = ps.openIterator( "event_serve_join" ); while (itr.hasNext()) System .out.println(itr.next()); }
          Vivek Padmanabhan created issue -

            People

            • Assignee:
              Daniel Dai
              Reporter:
              Vivek Padmanabhan
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development