Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: tez-branch
    • Fix Version/s: 0.14.0
    • Component/s: tez
    • Labels:
      None

      Description

      To reproduce the issue, run the following query-

      x = LOAD 'foo' AS (x:int, y:chararray);
      y = LOAD 'bar' AS (x:int, y:chararray);
      a = JOIN x BY x, y BY x USING 'skewed';
      z = LOAD 'zoo' AS (x:int, y:chararray);
      b = JOIN a BY x::x, z BY x USING 'replicated';
      DUMP b;
      

      This fails at runtime with the following error-

      : Container released by application, AttemptID:attempt_1399657418038_0357_1_04_000000_3 Info:Error: org.apache.pig.backend.executionengine.ExecException: ERROR 2135: Received error from POLocalRearrage function.wrong key class: class org.apache.pig.impl.io.NullableIntWritable is not class org.apache.pig.impl.io.NullablePartitionWritable
      : at org.apache.pig.backend.hadoop.executionengine.tez.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:175)
      : at org.apache.pig.backend.hadoop.executionengine.tez.PigProcessor.runPipeline(PigProcessor.java:276)
      : at org.apache.pig.backend.hadoop.executionengine.tez.PigProcessor.run(PigProcessor.java:175)
      : at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307)
      : at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:581)
      : at java.security.AccessController.doPrivileged(Native Method)
      : at javax.security.auth.Subject.doAs(Subject.java:415)
      : at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
      : at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:570)
      : Caused by: java.io.IOException: wrong key class: class org.apache.pig.impl.io.NullableIntWritable is not class org.apache.pig.impl.io.NullablePartitionWritable
      : at org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:212)
      : at org.apache.tez.runtime.library.broadcast.output.FileBasedKVWriter.write(FileBasedKVWriter.java:149)
      : at org.apache.pig.backend.hadoop.executionengine.tez.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:160)
      : ... 8 more
      
      1. PIG-3959-1.patch
        4 kB
        Cheolsoo Park

        Activity

        Cheolsoo Park created issue -
        Daniel Dai made changes -
        Field Original Value New Value
        Fix Version/s 0.14.0 [ 12326954 ]
        Fix Version/s tez-branch [ 12324968 ]
        Hide
        Cheolsoo Park added a comment -

        The problem is as follows-

        • If replicated join happens in the same vertex as in skewed join, 3 edges are connected to the join vertex: 2 for skewed join input tables and 1 for replicated join input table.
        • Since TezDagBuilder blindly sets the intermediate input/output key of all the inbound edges to NullablePartitionWritable, the input/output key of replicated join input edge is also set to NullablePartitionWritable.
        • This causes an exception in POLocalRearrangeTez.java:
          wrong key class: class org.apache.pig.impl.io.NullableIntWritable is not class org.apache.pig.impl.io.NullablePartitionWritable
          

        In the attached patch, I changed the condition in TezDagBuilder to apply NullablePartitionWritable only if isSkewedJoin && isConnectedToPackage.

        Show
        Cheolsoo Park added a comment - The problem is as follows- If replicated join happens in the same vertex as in skewed join, 3 edges are connected to the join vertex: 2 for skewed join input tables and 1 for replicated join input table. Since TezDagBuilder blindly sets the intermediate input/output key of all the inbound edges to NullablePartitionWritable , the input/output key of replicated join input edge is also set to NullablePartitionWritable . This causes an exception in POLocalRearrangeTez.java: wrong key class: class org.apache.pig.impl.io.NullableIntWritable is not class org.apache.pig.impl.io.NullablePartitionWritable In the attached patch, I changed the condition in TezDagBuilder to apply NullablePartitionWritable only if isSkewedJoin && isConnectedToPackage .
        Cheolsoo Park made changes -
        Attachment PIG-3959-1.patch [ 12649621 ]
        Cheolsoo Park made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Rohini Palaniswamy added a comment -

        +1

        Show
        Rohini Palaniswamy added a comment - +1
        Hide
        Cheolsoo Park added a comment -

        Committed to trunk. Thank you Rohini for the review!

        Show
        Cheolsoo Park added a comment - Committed to trunk. Thank you Rohini for the review!
        Cheolsoo Park made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Daniel Dai made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        18d 23h 46m 1 Cheolsoo Park 10/Jun/14 18:23
        Patch Available Patch Available Resolved Resolved
        5h 9m 1 Cheolsoo Park 10/Jun/14 23:32
        Resolved Resolved Closed Closed
        163d 7h 25m 1 Daniel Dai 21/Nov/14 05:58

          People

          • Assignee:
            Cheolsoo Park
            Reporter:
            Cheolsoo Park
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development