Pig
  1. Pig
  2. PIG-2535

Bug in new logical plan results in no output for join

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.1, 0.9.1, 0.10.0
    • Fix Version/s: 0.10.0, 0.9.3, 0.11
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The below script is a snippet of a much larger script. The join in the script results in 0 output for Pig 0.8,0.9 and 0.10 though there are matching records.

      event_serve = LOAD 'input1'   USING MyMapLoader() AS (s:map[], m:map[], l:map[]);
      raw = LOAD 'input2'  USING MyMapLoader() AS (s:map[], m:map[], l:map[]);
      
      SPLIT raw INTO
          serve_raw IF (( (chararray) (s#'type') == '0') AND ( (chararray) (s#'source') == '5')),
          cm_click_raw IF (( (chararray) (s#'type') == '1') AND ( (chararray) (s#'source') == '5'));
      
      cm_serve = FOREACH serve_raw GENERATE  s#'cm_serve_id' AS cm_event_guid,  s#'cm_serve_timestamp_ms' AS cm_receive_time, s#'p_url' AS ctx ;
      cm_serve_lowercase = stream cm_serve through `tr [:upper:] [:lower:]`;
      cm_serve_final = FOREACH cm_serve_lowercase GENERATE  $0 AS cm_event_guid, $1 AS cm_receive_time, $2 AS ctx;
      filtered = FILTER event_serve BY (chararray) (s#'filter_key') neq 'xxxx' AND (chararray) (s#'filter_key') neq 'yyyy';
      event_serve_project = FOREACH filtered GENERATE s#'event_guid' AS event_guid, s#'receive_time' AS receive_time;
      event_serve_join = join cm_serve_final by (cm_event_guid, cm_receive_time), event_serve_project by (event_guid, receive_time) PARALLEL 800;
      STORE event_serve_join INTO 'output' ;
      

      The script produces correct results if I disable ColumnMapKeyPrune optimizer.

      1. PIG-2535-0.patch
        0.9 kB
        Daniel Dai
      2. PIG-2535-1.patch
        3 kB
        Daniel Dai
      3. PIG-2535-2.patch
        3 kB
        Daniel Dai

        Activity

        Hide
        Daniel Dai added a comment -

        Unit test pass. test-patch:
        [exec] -1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 3 new or modified tests.
        [exec]
        [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] -1 release audit. The applied patch generated 533 release audit warnings (more than the trunk's current 530 warnings).

        javadoc and release audit warning is unrelated.

        Patch committed to 0.9/0.10/trunk.

        Show
        Daniel Dai added a comment - Unit test pass. test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 533 release audit warnings (more than the trunk's current 530 warnings). javadoc and release audit warning is unrelated. Patch committed to 0.9/0.10/trunk.
        Hide
        Daniel Dai added a comment -

        PIG-2535-2.patch resync with trunk.

        Show
        Daniel Dai added a comment - PIG-2535 -2.patch resync with trunk.
        Hide
        Thejas M Nair added a comment -

        +1

        Show
        Thejas M Nair added a comment - +1
        Hide
        Daniel Dai added a comment -

        Thanks Vivek for verifying. Attach a full patch.

        Show
        Daniel Dai added a comment - Thanks Vivek for verifying. Attach a full patch.
        Hide
        Vivek Padmanabhan added a comment -

        Hi Daniel, the script goes through fine with the patch after applying output schema for

        cm_serve_lowercase = stream cm_serve through `tr [:upper:] [:lower:]`;
        Show
        Vivek Padmanabhan added a comment - Hi Daniel, the script goes through fine with the patch after applying output schema for cm_serve_lowercase = stream cm_serve through `tr [:upper:] [:lower:] `;
        Hide
        Vivek Padmanabhan added a comment -

        Thanks Daniel. I ran the script with this patch, but it seems that now the script is generating infinite map outputs.
        (PIG-2534) Pig generating infinite map outputs

        I am getting lots of ACCESSING_NON_EXISTENT_FIELD = 21,146,912,208 and this is keeping on increasing.

        Show
        Vivek Padmanabhan added a comment - Thanks Daniel. I ran the script with this patch, but it seems that now the script is generating infinite map outputs. ( PIG-2534 ) Pig generating infinite map outputs I am getting lots of ACCESSING_NON_EXISTENT_FIELD = 21,146,912,208 and this is keeping on increasing.
        Hide
        Daniel Dai added a comment -

        MapKeysPruneHelper erroneously remove 'type' and 'source' which are split condition. Attach the draft patch. Vivek, can you give a try?

        Show
        Daniel Dai added a comment - MapKeysPruneHelper erroneously remove 'type' and 'source' which are split condition. Attach the draft patch. Vivek, can you give a try?

          People

          • Assignee:
            Daniel Dai
            Reporter:
            Vivek Padmanabhan
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development