Pig
  1. Pig
  2. PIG-2534

Pig generating infinite map outputs

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.1
    • Fix Version/s: 0.10.0, 0.9.3, 0.11
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I am getting a strange behavior by Pig in the below script for Pig 0.9.

      event_serve = LOAD 'input1'   AS (s, m, l);
      cm_data_raw = LOAD 'input2'  AS (s, m, l);
      
      SPLIT cm_data_raw INTO
          cm_serve_raw IF (( (chararray) (s#'key1') == '0') AND ( (chararray) (s#'key2') == '5')),
          cm_click_raw IF (( (chararray) (s#'key1') == '1') AND ( (chararray) (s#'key2') == '5'));
      
      cm_serve = FOREACH cm_serve_raw GENERATE  s#'key3' AS f1,  s#'key4' AS f2, s#'key5' AS f3 ;
      cm_serve_lowercase = stream cm_serve through `echo val3`;
      
      cm_serve_final = FOREACH cm_serve_lowercase GENERATE  $0 AS cm_event_guid, $1 AS cm_receive_time, $2 AS cm_ctx_url;
      
      event_serve_filtered = FILTER event_serve BY  (chararray) (s#'key1') neq 'xxx' AND (chararray) (s#'key2') neq 'yyy' ;
      
      event_serve_project = FOREACH event_serve_filtered GENERATE  s#'key3' AS event_guid, s#'key4' AS receive_time;
      
      event_serve_join = join cm_serve_final by (cm_event_guid),
          event_serve_project by (event_guid);
      
      store event_serve_join into 'somewhere';
      

      Input (both input1 and input2 is same)

      key1#0,key2#5,key3#val3,key4#val4,key5#val5

      If i run this pig script with ColumnMapKeyPrune disabled, the job goes through fine and 1 output is created.
      But if I run this script by default, then it keeps on generating map output records infinitely.

      1. PIG-2534-2.patch
        4 kB
        Daniel Dai
      2. PIG-2534-1.patch
        3 kB
        Daniel Dai

        Issue Links

          Activity

          Hide
          Vivek Padmanabhan added a comment -

          Posting a test case without SPLIT to reproduce this issue

          @Test
              public void testINFINITE() throws Exception {
          
                  File f1 = Util.createFile(new String [] {"[key1#0,key2#5,key3#val3,key4#val4,key5#val5]"} );
                  File f2 = Util.createFile(new String [] {"[key1#0,key2#5,key3#val3,key4#val4,key5#val5]"} );
          
                  PigServer ps = new PigServer(ExecType.LOCAL);
                  ps.registerQuery("event_serve = LOAD 'file://"+f1.getAbsolutePath()+"' AS (s, m, l);");
                  ps.registerQuery("cm_data_raw = LOAD 'file://"+f2.getAbsolutePath()+"' AS (s, m, l);");
                  ps.registerQuery("cm_serve = FOREACH cm_data_raw GENERATE  s#'key3' AS f1,  s#'key4' AS f2, s#'key5' AS f3 ;");
                  ps.registerQuery("cm_serve_lowercase = stream cm_serve through `tr [:upper:] [:lower:]`  ;");
                  ps.registerQuery("cm_serve_final = FOREACH cm_serve_lowercase GENERATE  $0 AS cm_event_guid, $1 AS cm_receive_time, $2 AS cm_ctx_url;");
                  ps.registerQuery("event_serve_project = FOREACH  event_serve GENERATE  s#'key3' AS event_guid, s#'key4' AS receive_time;");
                  ps.registerQuery("event_serve_join = join cm_serve_final by (cm_event_guid), event_serve_project by (event_guid);");
          
                  Iterator<Tuple> itr = ps.openIterator("event_serve_join");
                  while(itr.hasNext())
                      System.out.println(itr.next());
          
              }
          
          Show
          Vivek Padmanabhan added a comment - Posting a test case without SPLIT to reproduce this issue @Test public void testINFINITE() throws Exception { File f1 = Util.createFile( new String [] { "[key1#0,key2#5,key3#val3,key4#val4,key5#val5]" } ); File f2 = Util.createFile( new String [] { "[key1#0,key2#5,key3#val3,key4#val4,key5#val5]" } ); PigServer ps = new PigServer(ExecType.LOCAL); ps.registerQuery( "event_serve = LOAD 'file: //" +f1.getAbsolutePath()+ "' AS (s, m, l);" ); ps.registerQuery( "cm_data_raw = LOAD 'file: //" +f2.getAbsolutePath()+ "' AS (s, m, l);" ); ps.registerQuery( "cm_serve = FOREACH cm_data_raw GENERATE s#'key3' AS f1, s#'key4' AS f2, s#'key5' AS f3 ;" ); ps.registerQuery( "cm_serve_lowercase = stream cm_serve through `tr [:upper:] [:lower:]` ;" ); ps.registerQuery( "cm_serve_final = FOREACH cm_serve_lowercase GENERATE $0 AS cm_event_guid, $1 AS cm_receive_time, $2 AS cm_ctx_url;" ); ps.registerQuery( "event_serve_project = FOREACH event_serve GENERATE s#'key3' AS event_guid, s#'key4' AS receive_time;" ); ps.registerQuery( "event_serve_join = join cm_serve_final by (cm_event_guid), event_serve_project by (event_guid);" ); Iterator<Tuple> itr = ps.openIterator( "event_serve_join" ); while (itr.hasNext()) System .out.println(itr.next()); }
          Hide
          Daniel Dai added a comment -

          It happens when stream does not have an output schema. A workaround is to change
          stream cm_serve through `tr [:upper:] [:lower:]`
          into
          stream cm_serve through `tr [:upper:] [:lower:]` as (a, b, c)

          I will upload a patch shortly.

          Show
          Daniel Dai added a comment - It happens when stream does not have an output schema. A workaround is to change stream cm_serve through `tr [:upper:] [:lower:] ` into stream cm_serve through `tr [:upper:] [:lower:] ` as (a, b, c) I will upload a patch shortly.
          Hide
          Thejas M Nair added a comment -

          +1
          I think we should add a javadoc comment to the setOutputUids function, saying that it checks for null schema and throws an exception to stop column pruning from happening. We should also consider moving such checks (such as null schema) that need to happen on all logical operators to a separate visitor (subclass of a AllLogicalExpressionVisitor?), which gets called from the Transformer.check().

          Show
          Thejas M Nair added a comment - +1 I think we should add a javadoc comment to the setOutputUids function, saying that it checks for null schema and throws an exception to stop column pruning from happening. We should also consider moving such checks (such as null schema) that need to happen on all logical operators to a separate visitor (subclass of a AllLogicalExpressionVisitor?), which gets called from the Transformer.check().
          Hide
          Dmitriy V. Ryaboy added a comment -

          It throws an exception as part of normal flow control? Can we refactor that?

          Show
          Dmitriy V. Ryaboy added a comment - It throws an exception as part of normal flow control? Can we refactor that?
          Hide
          Daniel Dai added a comment -

          Attach PIG-2534-2.patch to add comments to setOutputUids, also fix a findbug warning. For refactory, we can certainly do it, but since that is lower priority, I will open a separate ticket to address it.

          Show
          Daniel Dai added a comment - Attach PIG-2534 -2.patch to add comments to setOutputUids, also fix a findbug warning. For refactory, we can certainly do it, but since that is lower priority, I will open a separate ticket to address it.
          Hide
          Daniel Dai added a comment -

          Unit test pass. test-patch:
          [exec] -1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 3 new or modified tests.
          [exec]
          [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] -1 release audit. The applied patch generated 533 release audit warnings (more than the trunk's current 530 warnings).

          javadoc and release audit warning is unrelated.

          Patch committed to 0.9/0.10/trunk

          Show
          Daniel Dai added a comment - Unit test pass. test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 533 release audit warnings (more than the trunk's current 530 warnings). javadoc and release audit warning is unrelated. Patch committed to 0.9/0.10/trunk
          Hide
          Daniel Dai added a comment -

          Open PIG-2566 to track Dmitriy's comment.

          Show
          Daniel Dai added a comment - Open PIG-2566 to track Dmitriy's comment.

            People

            • Assignee:
              Daniel Dai
              Reporter:
              Vivek Padmanabhan
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development