Pig
  1. Pig
  2. PIG-3355

ColumnMapKeyPrune bug with distinct operator

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.2, 0.10.1, 0.11.1
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None

      Description

      We came across a bug that happens when you have a distinct operator immediately followed by a union where the result of the union has at least one column that will be pruned by ColumnMapKeyPrune. There's a test showing an example script in the submitted patch.

      1. PIG-3355.patch
        3 kB
        Jeremy Karn

        Activity

        Daniel Dai made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Cheolsoo Park made changes -
        Fix Version/s 0.12 [ 12323380 ]
        Hide
        Cheolsoo Park added a comment -

        Updating the fix version.

        Show
        Cheolsoo Park added a comment - Updating the fix version.
        Aniket Mokashi made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Aniket Mokashi added a comment -

        Koji Noguchi, Feel free to rebase and submit a patch for 0.11.

        Show
        Aniket Mokashi added a comment - Koji Noguchi , Feel free to rebase and submit a patch for 0.11.
        Hide
        Koji Noguchi added a comment -

        Committed to trunk. Thanks Jeremy!

        Aniket Mokashi, status is still "Patch Available"?
        Also, can we patch 0.11 as well so that it'll be included if we release another 0.11.* ?

        Show
        Koji Noguchi added a comment - Committed to trunk. Thanks Jeremy! Aniket Mokashi , status is still "Patch Available"? Also, can we patch 0.11 as well so that it'll be included if we release another 0.11.* ?
        Hide
        Aniket Mokashi added a comment -

        Committed to trunk. Thanks Jeremy!

        Show
        Aniket Mokashi added a comment - Committed to trunk. Thanks Jeremy!
        Aniket Mokashi made changes -
        Assignee Jeremy Karn [ jeremykarn ]
        Hide
        Aniket Mokashi added a comment -

        +1. Will commit if ant test-commit passes.

        Show
        Aniket Mokashi added a comment - +1. Will commit if ant test-commit passes.
        Hide
        Jeremy Karn added a comment -

        I should also mention that this bug manifests itself in a couple of different ways. The job generally crashes at some point
        where the schema doesn't match the data tuple. The most common exceptions we've seen are like:

        java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
        at java.util.ArrayList.RangeCheck(ArrayList.java:547)
        at java.util.ArrayList.get(ArrayList.java:322)
        at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:159)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getValueTuple(POPackage.java:341)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:264)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:416)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)

        2013-06-13 15:28:14,188 java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableBytesWritable
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:845)
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541)
        at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:127)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:273)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

        Show
        Jeremy Karn added a comment - I should also mention that this bug manifests itself in a couple of different ways. The job generally crashes at some point where the schema doesn't match the data tuple. The most common exceptions we've seen are like: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:159) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getValueTuple(POPackage.java:341) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:264) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:416) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:407) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) 2013-06-13 15:28:14,188 java.io.IOException: Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableBytesWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:845) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:127) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:273) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:266) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170)
        Jeremy Karn made changes -
        Affects Version/s 0.11.1 [ 12324080 ]
        Affects Version/s 0.10.1 [ 12320547 ]
        Affects Version/s 0.9.2 [ 12318248 ]
        Jeremy Karn made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Jeremy Karn made changes -
        Field Original Value New Value
        Attachment PIG-3355.patch [ 12587866 ]
        Jeremy Karn created issue -

          People

          • Assignee:
            Jeremy Karn
            Reporter:
            Jeremy Karn
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development