Pig
  1. Pig
  2. PIG-3446 Umbrella jira for Pig on Tez
  3. PIG-3893

Hash join followed by replicated join fails in Tez mode

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: tez-branch
    • Fix Version/s: tez-branch
    • Component/s: tez
    • Labels:
      None

      Description

      To reproduce the issue, run this query-

      x = LOAD 'foo' AS (x:int, y:chararray);
      y = LOAD 'bar' AS (x:int, y:chararray);
      a = JOIN x BY x, y BY x;
      z = LOAD 'zoo' AS (x:int, y:chararray);
      b = JOIN a BY x::x, z BY x USING 'replicated';
      DUMP b;
      

      This fails with the following error-

                          : Container released by application, AttemptID:attempt_1397437587062_0071_1_03_000000_3 Info:Error: org.apache.pig.backend.executionengine.ExecException: ERROR 0: java.lang.ClassCastException: org.apache.tez.runtime.library.common.readers.ShuffledUnorderedKVReader cannot be cast to org.apache.tez.runtime.library.api.KeyValuesReader
                          : at org.apache.pig.backend.hadoop.executionengine.tez.POShuffleTezLoad.attachInputs(POShuffleTezLoad.java:108)
                          : at org.apache.pig.backend.hadoop.executionengine.tez.PigProcessor.initializeInputs(PigProcessor.java:202)
                          : at org.apache.pig.backend.hadoop.executionengine.tez.PigProcessor.run(PigProcessor.java:141)
                          : at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307)
                          : at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:562)
                          : at java.security.AccessController.doPrivileged(Native Method)
                          : at javax.security.auth.Subject.doAs(Subject.java:415)
                          : at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
                          : at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:551)
                          : Caused by: java.lang.ClassCastException: org.apache.tez.runtime.library.common.readers.ShuffledUnorderedKVReader cannot be cast to org.apache.tez.runtime.library.api.KeyValuesReader
                          : at org.apache.pig.backend.hadoop.executionengine.tez.POShuffleTezLoad.attachInputs(POShuffleTezLoad.java:89)
                          : ... 8 more
      

      The problem is that POLR that belongs to FRJoin is attached to POShuffleTezLoad since replicated join runs in the same vertex as in hash join.

      1. PIG-3893-1.patch
        3 kB
        Cheolsoo Park

        Activity

        Cheolsoo Park created issue -
        Hide
        Cheolsoo Park added a comment - - edited

        The attached patch adds a boolean flag to POLR to indicate whether it's FRJoin or not. In TezDagBuilder, the input that belong to FRJoin POLR is skipped.

        Show
        Cheolsoo Park added a comment - - edited The attached patch adds a boolean flag to POLR to indicate whether it's FRJoin or not. In TezDagBuilder, the input that belong to FRJoin POLR is skipped.
        Cheolsoo Park made changes -
        Field Original Value New Value
        Attachment PIG-3893-1.patch [ 12640123 ]
        Hide
        Daniel Dai added a comment -

        +1

        Show
        Daniel Dai added a comment - +1
        Hide
        Cheolsoo Park added a comment -

        Committed to tez branch. Thank you Daniel!

        Show
        Cheolsoo Park added a comment - Committed to tez branch. Thank you Daniel!
        Cheolsoo Park made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Daniel Dai made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        1h 39m 1 Cheolsoo Park 14/Apr/14 21:57
        Resolved Resolved Closed Closed
        220d 9h 1m 1 Daniel Dai 21/Nov/14 05:58

          People

          • Assignee:
            Cheolsoo Park
            Reporter:
            Cheolsoo Park
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development