Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3469

Skewed join can cause unrecoverable NullPointerException when one of its inputs is missing.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.11
    • 0.13.0
    • None
    • None
    • Apache Pig version 0.11.0-cdh4.4.0
      Happens in both local execution environment (os x) and cluster environment (linux)

    Description

      Run this script in the local execution environment (affects cluster mode too):

      %declare DATA_EXISTS /tmp/test_data_exists.tsv
      %declare DATA_MISSING /tmp/test_data_missing.tsv
      %declare DUMMY `bash -c '(for (( i=0; \$i < 10; i++ )); do echo \$i; done) > /tmp/test_data_exists.tsv; true'`
      
      exists = LOAD '$DATA_EXISTS' AS (a:long);
      missing = LOAD '$DATA_MISSING' AS (a:long);
      
      missing = FOREACH ( GROUP missing BY a ) GENERATE $0 AS a, COUNT_STAR($1);
      
      joined = JOIN exists BY a, missing BY a USING 'skewed';
      
      STORE joined INTO '/tmp/test_out.tsv';
      

      Results in NullPointerException which halts entire pig execution, including unrelated jobs. Expected: only dependencies of the error'd LOAD statement should fail.

      Error:

      2013-09-18 11:42:31,518 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error creating job configuration.
      2013-09-18 11:42:31,518 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:848)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:294)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177)
      	at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
      	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
      	at org.apache.pig.PigServer.execute(PigServer.java:1241)
      	at org.apache.pig.PigServer.executeBatch(PigServer.java:335)
      	at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137)
      	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
      	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
      	at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
      	at org.apache.pig.Main.run(Main.java:604)
      	at org.apache.pig.Main.main(Main.java:157)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      	at java.lang.reflect.Method.invoke(Method.java:597)
      	at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
      Caused by: java.lang.NullPointerException
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.adjustNumReducers(JobControlCompiler.java:868)
      	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:480)
      	... 17 more
      

      Script above is as small as I can make it while still reproducing the issue. Removing the group-foreach causes the join to fail harmlessly (not stopping pig execution), as does using the default join. Did not occur on 0.10.1.

      Attachments

        1. PIG-3469.patch
          0.9 kB
          Jarek Jarcec Cecho
        2. PIG-3469.patch
          12 kB
          Jarek Jarcec Cecho
        3. PIG-3469.patch
          2 kB
          Jarek Jarcec Cecho

        Activity

          People

            jarcec Jarek Jarcec Cecho
            xton Christon DeWan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: