Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2932

Setting high default_parallel causes IOException in local mode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 0.11
    • None
    • None

    Description

      This bug has been confirmed only in local mode.

      When setting a high default_parallel, Pig fails on some operations.
      The following data and script reproduce the bug.

      Data:

      grunt> cat file.txt                                  
      11	1	qwer
      12	2	qwerty
      13	3	ert
      13	3	ertyu
      14	4	zxcv
      16	6	fsdfg
      16	6	fdfghj
      18	8	fjklopi
      

      Script:

      SET default_parallel 9
      a = load 'file.txt' as (id1:int, id2:int, str:chararray);
      b = group a by (id1,id2);
      c = foreach b generate flatten(group), a;
      d = order c by group::id1 ASC, group::id2 ASC;
      dump d
      

      Error:

      2012-09-26 15:28:13,230 [Thread-32] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] C:  R: 
      2012-09-26 15:28:13,232 [Thread-32] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0009
      java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1)
      	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
      	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
      	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
      	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
      	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
      

      The script succeeds if default_parallel is set to 2.
      I guess it depends on the fact that the default_parallel is higher than the number of unique keys, probably some quirk with ORDER BY.

      Attachments

        1. PIG-2932.patch
          3 kB
          Cheolsoo Park

        Issue Links

          Activity

            People

              cheolsoo Cheolsoo Park
              azaroth Gianmarco De Francisci Morales
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: