Pig
  1. Pig
  2. PIG-2152

Null pointer exception while reporting progress

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.9.1, 0.10.0
    • Component/s: None
    • Labels:
      None

      Description

      We have observed the following issues with code built from Pig 0.9 branch. We have not seen this with earlier versions; however, since this happens once in a while and is not reproducible at will it is not clear whether the issue is specific to 0.9 or not.

      Here is the stack:

      java.lang.NullPointerException at
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.ProgressableReporter.progress(ProgressableReporter.java:37)
      at
      org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:399)
      at
      org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
      at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
      at
      org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261) at
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256) at
      org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58) at
      org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
      org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
      org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:261) at
      java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at
      org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at
      org.apache.hadoop.mapred.Child.main(Child.java:255)

      Note that the code in progress function looks as follows:

      public void progress()

      { if(rep!=null) rep.progress(); }

      This points to some sort of synchronization issue

      1. PIG-2152.1.patch
        0.6 kB
        Thejas M Nair
      2. null_pointer_traces (copy)
        256 kB
        Vivek Padmanabhan

        Activity

        Hide
        Thejas M Nair added a comment -

        Patch committed to 0.9 branch and trunk.

        Show
        Thejas M Nair added a comment - Patch committed to 0.9 branch and trunk.
        Hide
        Daniel Dai added a comment -

        I would suggest to commit the patch as is.

        Show
        Daniel Dai added a comment - I would suggest to commit the patch as is.
        Hide
        Daniel Dai added a comment -

        +1 for patch. Can anyone give a try?

        Show
        Daniel Dai added a comment - +1 for patch. Can anyone give a try?
        Hide
        Thejas M Nair added a comment -

        Vivek and Josh, thanks for tracing the issue. I have the change to fix this in PIG-2152.1.patch, but I don't have the setup and query that I can use to verify the fix. It is not easy to test this in a unit test, so it does not have any.

        Show
        Thejas M Nair added a comment - Vivek and Josh, thanks for tracing the issue. I have the change to fix this in PIG-2152 .1.patch, but I don't have the setup and query that I can use to verify the fix. It is not easy to test this in a unit test, so it does not have any.
        Hide
        Josh Wills added a comment -

        I think that simply removing the line:

        pigReporter.setRep(null);

        is the best solution-- I don't see what purpose it was supposed to serve.

        Show
        Josh Wills added a comment - I think that simply removing the line: pigReporter.setRep(null); is the best solution-- I don't see what purpose it was supposed to serve.
        Hide
        Daniel Dai added a comment -

        Or we can make ProgressableReporter.progress synchronized.

        Show
        Daniel Dai added a comment - Or we can make ProgressableReporter.progress synchronized.
        Hide
        Olga Natkovich added a comment -

        Vivek has discovered the following:

        ProgressableReporter object is set to null in the combiner cleanup.

        protected void cleanup(Context context) throws IOException, InterruptedException

        { super.cleanup(context); leaf = null; pack = null; > pigReporter.setRep(null); pigReporter = null; pigContext = null; roots = null; cp = null; }

        The same object (pigReporter) is retained and used by all PhysicalOperators
        (PhysicalOperator.setReporter(pigReporter)
        This may have caused for the NullPointerException. The above changes were introduced as part of PIG-1815

        Show
        Olga Natkovich added a comment - Vivek has discovered the following: ProgressableReporter object is set to null in the combiner cleanup. protected void cleanup(Context context) throws IOException, InterruptedException { super.cleanup(context); leaf = null; pack = null; > pigReporter.setRep(null); pigReporter = null; pigContext = null; roots = null; cp = null; } The same object (pigReporter) is retained and used by all PhysicalOperators (PhysicalOperator.setReporter(pigReporter) This may have caused for the NullPointerException. The above changes were introduced as part of PIG-1815
        Hide
        Josh Wills added a comment -

        Hey, I hit this issue too, using CDH3u1. I'm wondering if you're hitting it when a) you're running over lots of data and b) using combiners. It looks to me that the issue is the setRep(null) call in the cleanup() method in PigCombiner, which is impacting the Map tasks b/c PhysicalOperator only uses a single global PigReporter and the Map tasks and the Combiner tasks run in the same JVM.

        Could we reproduce this by messing with the settings to force spills more often than just once?

        Show
        Josh Wills added a comment - Hey, I hit this issue too, using CDH3u1. I'm wondering if you're hitting it when a) you're running over lots of data and b) using combiners. It looks to me that the issue is the setRep(null) call in the cleanup() method in PigCombiner, which is impacting the Map tasks b/c PhysicalOperator only uses a single global PigReporter and the Map tasks and the Combiner tasks run in the same JVM. Could we reproduce this by messing with the settings to force spills more often than just once?
        Hide
        Vivek Padmanabhan added a comment -

        Even though not reproducible , the exception is happening randomly and quite frequently.
        From most of the failed jobs it looks like this is happening towards end of the task execution. Till now, it is seen only in Map Tasks.
        The below is one NullPointer from progess() with a different call heirachy;

        at
        org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.ProgressableReporter.progress(ProgressableReporter.java:37)
        at org.apache.pig.data.DefaultAbstractBag.reportProgress(DefaultAbstractBag.java:369)
        at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.next(DefaultDataBag.java:165)
        at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.hasNext(DefaultDataBag.java:157)
        at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
        at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:542)
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
        at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:542)
        at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
        at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:57)
        at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
        at
        org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
        at
        org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1069)
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
        at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:124)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:263)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:255)

        Show
        Vivek Padmanabhan added a comment - Even though not reproducible , the exception is happening randomly and quite frequently. From most of the failed jobs it looks like this is happening towards end of the task execution. Till now, it is seen only in Map Tasks. The below is one NullPointer from progess() with a different call heirachy; at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.ProgressableReporter.progress(ProgressableReporter.java:37) at org.apache.pig.data.DefaultAbstractBag.reportProgress(DefaultAbstractBag.java:369) at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.next(DefaultDataBag.java:165) at org.apache.pig.data.DefaultDataBag$DefaultDataBagIterator.hasNext(DefaultDataBag.java:157) at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:542) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:542) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357) at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:57) at org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1069) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:124) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:263) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:261) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:255)
        Hide
        Vivek Padmanabhan added a comment -

        Attaching a list of different traces

        Show
        Vivek Padmanabhan added a comment - Attaching a list of different traces

          People

          • Assignee:
            Thejas M Nair
            Reporter:
            Olga Natkovich
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development