Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3972

java.lang.IndexOutOfBoundsException when flatten meets an empty row

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.12.0, 0.13.0
    • None
    • None
    • None

    Description

      test1.txt
      {(,A1111,A),(,B222,B),(,C333,C)}
      
      test2.txt
      A       Helloworld
      B       Pig
      C       Hive
      
      tt.pig
      A = LOAD 'test1.txt' AS (mybag:bag{t:tuple(title:chararray,name:chararray, id:chararray)});
      A1 = FOREACH A generate flatten(mybag) as (title:chararray,name:chararray, id:chararray);
      B = LOAD 'test2.txt' AS (id:chararray, content:chararray);
      C = JOIN A1 BY id LEFT OUTER, B BY id;
      dump C;
      
      $ pig -x local -f tt.pig
      ....
      java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
              at java.util.ArrayList.rangeCheck(ArrayList.java:604)
              at java.util.ArrayList.get(ArrayList.java:382)
              at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:115)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getValueTuple(POPackage.java:350)
              at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:273)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:425)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:416)
              at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256)
              at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
              at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:636)
              at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:396)
              at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:441)
            [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failu
      re if you want Pig to stop immediately on failure.
      ==============
      

      If we change the code into this one:

      tt.pig
      A = LOAD 'test1.txt' AS (mybag:bag{t:tuple(title:chararray,name:chararray, id:chararray)});
      A1 = FOREACH A generate flatten(mybag) as (title:chararray,name:chararray, id:chararray);
      A1 = FOREACH A1 generate title,name,id;
      B = LOAD 'test2.txt' AS (id:chararray, content:chararray);
      C = JOIN A1 BY id LEFT OUTER, B BY id;
      dump C;
      

      The job succeed, and here is the result of execution.

      ========
      (,A1111,A,A,Helloworld)
      (,B222,B,B,Pig)
      (,C333,C,C,Hive)
      (,,,,)
      ========
      

      Attachments

        Issue Links

          Activity

            People

              chitnis Mona Chitnis
              jiangbinglover Bing Jiang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: