Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2470

Receiving NPE occasionally on RunningJob.getCounters() call

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.23.0
    • Component/s: client
    • Labels:
      None
    • Environment:

      FreeBSD, Java6, Hadoop r0.21.0

    • Hadoop Flags:
      Reviewed

      Description

      This is running in a Java daemon that is used as an interface (Thrift) to get information and data from MR Jobs. Using JobClient.getJob(JobID) I successfully get a RunningJob object (I'm checking for NULL), and then rarely I get an NPE when I do RunningJob.getCounters(). This seems to occur after the daemon has been up and running for a while, and in the event of an Exception, I close the JobClient, set it to NULL, and a new one should then be created on the next request for data. Yet, I still seem to be unable to fetch the Counters. Below is the stack trace.

      java.lang.NullPointerException
      at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
      at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
      at com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
      at com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
      at com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
      at org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
      at org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:619)

      1. counters_null_data.pcap
        1.0 kB
        Aaron Baff
      2. MAPREDUCE-2470-v1.patch
        2 kB
        Robert Joseph Evans
      3. MAPREDUCE-2470-v2.patch
        5 kB
        Robert Joseph Evans
      4. MAPREDUCE-2470-v3.patch
        7 kB
        Robert Joseph Evans

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #690 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/690/)
        MAPREDUCE-2470. Fix NPE in RunningJobs::getCounters.
        Contributed by Robert Joseph Evans

        cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1127444
        Files :

        • /hadoop/mapreduce/trunk/CHANGES.txt
        • /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestNetworkedJob.java
        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java
        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/RunningJob.java
        • /hadoop/mapreduce/trunk/src/test/system/test/org/apache/hadoop/mapred/TestTaskKilling.java
        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobClient.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #690 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/690/ ) MAPREDUCE-2470 . Fix NPE in RunningJobs::getCounters. Contributed by Robert Joseph Evans cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1127444 Files : /hadoop/mapreduce/trunk/CHANGES.txt /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestNetworkedJob.java /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/RunningJob.java /hadoop/mapreduce/trunk/src/test/system/test/org/apache/hadoop/mapred/TestTaskKilling.java /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobClient.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #700 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/700/)
        MAPREDUCE-2470. Fix NPE in RunningJobs::getCounters.
        Contributed by Robert Joseph Evans

        cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1127444
        Files :

        • /hadoop/mapreduce/trunk/CHANGES.txt
        • /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestNetworkedJob.java
        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java
        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/RunningJob.java
        • /hadoop/mapreduce/trunk/src/test/system/test/org/apache/hadoop/mapred/TestTaskKilling.java
        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobClient.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #700 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/700/ ) MAPREDUCE-2470 . Fix NPE in RunningJobs::getCounters. Contributed by Robert Joseph Evans cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1127444 Files : /hadoop/mapreduce/trunk/CHANGES.txt /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestNetworkedJob.java /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/RunningJob.java /hadoop/mapreduce/trunk/src/test/system/test/org/apache/hadoop/mapred/TestTaskKilling.java /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobClient.java
        Hide
        Chris Douglas added a comment -

        I committed the revised patch. Thanks, Robert!

        Show
        Chris Douglas added a comment - I committed the revised patch. Thanks, Robert!
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12480124/MAPREDUCE-2470-v3.patch
        against trunk revision 1126499.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/295//testReport/
        Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/295//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/295//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12480124/MAPREDUCE-2470-v3.patch against trunk revision 1126499. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/295//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/295//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/295//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        Fixed issues with fault injection build.

        Show
        Robert Joseph Evans added a comment - Fixed issues with fault injection build.
        Hide
        Robert Joseph Evans added a comment -

        I am very sorry about that. I ran the tests after V1, but not V2 of the patch. I will investigate the failures.

        Show
        Robert Joseph Evans added a comment - I am very sorry about that. I ran the tests after V1, but not V2 of the patch. I will investigate the failures.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #686 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/686/)
        Revert MAPREDUCE-2470
        MAPREDUCE-2470. Fix NPE in RunningJobs::getCounters.
        Contributed by Robert Joseph Evans

        cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1125599
        Files :

        • /hadoop/mapreduce/trunk/CHANGES.txt
        • /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestNetworkedJob.java
        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java
        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/RunningJob.java
        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobClient.java

        cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1125578
        Files :

        • /hadoop/mapreduce/trunk/CHANGES.txt
        • /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestNetworkedJob.java
        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java
        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/RunningJob.java
        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobClient.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #686 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/686/ ) Revert MAPREDUCE-2470 MAPREDUCE-2470 . Fix NPE in RunningJobs::getCounters. Contributed by Robert Joseph Evans cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1125599 Files : /hadoop/mapreduce/trunk/CHANGES.txt /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestNetworkedJob.java /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/RunningJob.java /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobClient.java cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1125578 Files : /hadoop/mapreduce/trunk/CHANGES.txt /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestNetworkedJob.java /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/RunningJob.java /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobClient.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #694 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/694/)
        Revert MAPREDUCE-2470

        cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1125599
        Files :

        • /hadoop/mapreduce/trunk/CHANGES.txt
        • /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestNetworkedJob.java
        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java
        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/RunningJob.java
        • /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobClient.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #694 (See https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/694/ ) Revert MAPREDUCE-2470 cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1125599 Files : /hadoop/mapreduce/trunk/CHANGES.txt /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestNetworkedJob.java /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/protocol/ClientProtocol.java /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/RunningJob.java /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobClient.java
        Hide
        Chris Douglas added a comment -

        Reverted while FI breakage is investigated

        Show
        Chris Douglas added a comment - Reverted while FI breakage is investigated
        Hide
        Chris Douglas added a comment -

        sigh Yea, it looks like the fault injection is broken. I'll revert it. Not sure why Hudson didn't flag that...

        Show
        Chris Douglas added a comment - sigh Yea, it looks like the fault injection is broken. I'll revert it. Not sure why Hudson didn't flag that...
        Hide
        Todd Lipcon added a comment -
        Show
        Todd Lipcon added a comment - This seems to have broken the MR trunk build: https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/691/
        Hide
        Chris Douglas added a comment -

        The difference is that with Job.getStatus it calls internally updateStatus which will throw an IOException if it is unable to get an updated status from the JobTracker. The only time you can get an NPE from that is if you catch the IOException, ignore it, and keep trying to use the Job object what was originally created.

        Got it. Thanks for the explanation.

        +1 I committed this. Thanks!

        Show
        Chris Douglas added a comment - The difference is that with Job.getStatus it calls internally updateStatus which will throw an IOException if it is unable to get an updated status from the JobTracker. The only time you can get an NPE from that is if you catch the IOException, ignore it, and keep trying to use the Job object what was originally created. Got it. Thanks for the explanation. +1 I committed this. Thanks!
        Hide
        Robert Joseph Evans added a comment -

        Oh I forgot to add that all of the other downgrade methods in JobClient.java are similar to Job.getStatus, either they return values from the cached JobStatus which should never be null except if an IOException was caught and ignored, so they are not likely to cause an NPE.

        Show
        Robert Joseph Evans added a comment - Oh I forgot to add that all of the other downgrade methods in JobClient.java are similar to Job.getStatus, either they return values from the cached JobStatus which should never be null except if an IOException was caught and ignored, so they are not likely to cause an NPE.
        Hide
        Robert Joseph Evans added a comment -

        The difference is that with Job.getStatus it calls internally updateStatus which will throw an IOException if it is unable to get an updated status from the JobTracker. The only time you can get an NPE from that is if you catch the IOException, ignore it, and keep trying to use the Job object what was originally created. We can fix that, by modifying updateStatus to not overwrite Job.status until it knows that the updated status is not null. But that was not explicitly part of this JIRA, and I could not find any other JIRA to cover it so I though it was not a big deal, and most likely worked as designed.

        Show
        Robert Joseph Evans added a comment - The difference is that with Job.getStatus it calls internally updateStatus which will throw an IOException if it is unable to get an updated status from the JobTracker. The only time you can get an NPE from that is if you catch the IOException, ignore it, and keep trying to use the Job object what was originally created. We can fix that, by modifying updateStatus to not overwrite Job.status until it knows that the updated status is not null. But that was not explicitly part of this JIRA, and I could not find any other JIRA to cover it so I though it was not a big deal, and most likely worked as designed.
        Hide
        Aaron Baff added a comment -

        I haven't looked, but I could see that potentially being the case with all of the *.downgrade() methods. They might just assume that the new API returns an object no matter what, even if it doesn't contain any data.

        Show
        Aaron Baff added a comment - I haven't looked, but I could see that potentially being the case with all of the *.downgrade() methods. They might just assume that the new API returns an object no matter what, even if it doesn't contain any data.
        Hide
        Chris Douglas added a comment -

        Is there another NPE in JobClient::getJob()? It looks like exactly the same issue, but with JobStatus.

        Show
        Chris Douglas added a comment - Is there another NPE in JobClient::getJob() ? It looks like exactly the same issue, but with JobStatus .
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12479464/MAPREDUCE-2470-v2.patch
        against trunk revision 1103993.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/256//testReport/
        Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/256//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/256//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12479464/MAPREDUCE-2470-v2.patch against trunk revision 1103993. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/256//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/256//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/256//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        Canceling the patch to address the comments to V1. Will post V2 shortly.

        Show
        Robert Joseph Evans added a comment - Canceling the patch to address the comments to V1. Will post V2 shortly.
        Hide
        Robert Joseph Evans added a comment -

        Sorry, I misunderstood the comment from before. Yes I think it would be fine for us to not modify Counters.downgrade, and instead have NetworkedJob check for null to avoid calling downgrade at all.

        All other instances of calling *.downgrade from inside NetworedJob can never have a null passed to them, except possibly in the odd case where a null was explicitly set programatically, or possibly when the jobtracker completely loses a job and client.getJobStatus returns null and the first IOExeption is ignored. Either way it takes someone working hard to make it through an NPE.

        I looked and could not find an appropriate place to add in the test to an existing file, so I will create a new Unit Test file under org.apache.hadoop.mapred for it. Unless you know of a better place to put it.

        Show
        Robert Joseph Evans added a comment - Sorry, I misunderstood the comment from before. Yes I think it would be fine for us to not modify Counters.downgrade, and instead have NetworkedJob check for null to avoid calling downgrade at all. All other instances of calling *.downgrade from inside NetworedJob can never have a null passed to them, except possibly in the odd case where a null was explicitly set programatically, or possibly when the jobtracker completely loses a job and client.getJobStatus returns null and the first IOExeption is ignored. Either way it takes someone working hard to make it through an NPE. I looked and could not find an appropriate place to add in the test to an existing file, so I will create a new Unit Test file under org.apache.hadoop.mapred for it. Unless you know of a better place to put it.
        Hide
        Robert Joseph Evans added a comment -

        I will update the typo and move the test, thanks for the catch.

        I will also investigate the other Downgrade methods. I believe that none of them will ever be called with a null, unless something is seriously wrong, but I will investigate to be sure.

        Show
        Robert Joseph Evans added a comment - I will update the typo and move the test, thanks for the catch. I will also investigate the other Downgrade methods. I believe that none of them will ever be called with a null, unless something is seriously wrong, but I will investigate to be sure.
        Hide
        Chris Douglas added a comment -

        Minor comments on v1:

        • Typo in testNullCounterDoengrade
        • TestCounters might be a better location for the unit test than TestJobCounters, which is an integration test (starts a MiniMRCluster, etc.)

        More generally: this would not match the other API downgrade methods (e.g. the *ID, *Status, and *Report classes), which throw NPEs. Would it make sense for the client to handle the null RPC, instead?

        Show
        Chris Douglas added a comment - Minor comments on v1: Typo in testNullCounterDoengrade TestCounters might be a better location for the unit test than TestJobCounters , which is an integration test (starts a MiniMRCluster , etc.) More generally: this would not match the other API downgrade methods (e.g. the *ID, *Status, and *Report classes), which throw NPEs. Would it make sense for the client to handle the null RPC, instead?
        Hide
        Robert Joseph Evans added a comment -

        The contrib test failure is org.apache.hadoop.raid.TestRaidShellFsck.testFileBlockAndParityBlockMissingHar2 which has been failing for a while and is a known issue.

        Show
        Robert Joseph Evans added a comment - The contrib test failure is org.apache.hadoop.raid.TestRaidShellFsck.testFileBlockAndParityBlockMissingHar2 which has been failing for a while and is a known issue.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12478982/MAPREDUCE-2470-v1.patch
        against trunk revision 1102363.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/239//testReport/
        Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/239//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/239//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478982/MAPREDUCE-2470-v1.patch against trunk revision 1102363. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/239//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/239//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/239//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        Looks like Jenkins did not see the patch, so trying again

        Show
        Robert Joseph Evans added a comment - Looks like Jenkins did not see the patch, so trying again
        Hide
        Robert Joseph Evans added a comment -

        Added Version 1 of the patch. It simply avoids the NPE by returning null from getCounters after the job has retired, just like the Job class does.

        Show
        Robert Joseph Evans added a comment - Added Version 1 of the patch. It simply avoids the NPE by returning null from getCounters after the job has retired, just like the Job class does.
        Hide
        Robert Joseph Evans added a comment -

        Yes I totally agree, we need to update the documentation, and make sure no NPE happens. I also wanted to look at the other methods in RunningJob to be sure that they all behave correctly after the job has been retired. I don't know if there is a way to look it up in memory and return it even after the job has been retired, I will look at it and see.

        Show
        Robert Joseph Evans added a comment - Yes I totally agree, we need to update the documentation, and make sure no NPE happens. I also wanted to look at the other methods in RunningJob to be sure that they all behave correctly after the job has been retired. I don't know if there is a way to look it up in memory and return it even after the job has been retired, I will look at it and see.
        Hide
        Aaron Baff added a comment -

        NULL is a reasonable return if it isn't able to get the counters. What if the JobTracker has the data cached in memory? Couldn't it be possible to return the values from there instead of the HDFS file? Or would that be a bunch of work to see if it is still in memory?

        Just wanted to highlight that this is an NPE at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77), not just needing to get the Javadocs updated. Should be very quick and easy to simply add a if( counters == null )

        { return null; }

        before the for-each loop.

        Show
        Aaron Baff added a comment - NULL is a reasonable return if it isn't able to get the counters. What if the JobTracker has the data cached in memory? Couldn't it be possible to return the values from there instead of the HDFS file? Or would that be a bunch of work to see if it is still in memory? Just wanted to highlight that this is an NPE at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77), not just needing to get the Javadocs updated. Should be very quick and easy to simply add a if( counters == null ) { return null; } before the for-each loop.
        Hide
        Robert Joseph Evans added a comment -

        That was the fix we were looking at. Is making sure that RunningJob is documented properly to indicate what it will return for each API when the job is retired. I was thinking simply to return a null for the counters, which is what the client API returns for that same situation. I should be able to have a patch for it fairly quickly.

        Show
        Robert Joseph Evans added a comment - That was the fix we were looking at. Is making sure that RunningJob is documented properly to indicate what it will return for each API when the job is retired. I was thinking simply to return a null for the counters, which is what the client API returns for that same situation. I should be able to have a patch for it fairly quickly.
        Hide
        Aaron Baff added a comment -

        Yes, it's quite reproducible, at least for me. If a MR Job has been retired for around mapreduce.jobtracker.persist.jobstatus.hours hours (default 1 hour), then it is removed by the JobTracker from mapreduce.jobtracker.persist.jobstatus.dir (default /jobtracker/jobsInfo on HDFS). Once it's removed, the Counters are no longer available when fetching information about the Job. Going one step further, the JobTracker caches the last 1000 (default) MR Job info's in memory, yet even if it's still available in the cache and can be viewed through the Web UI, when you try and fetch the Counters programmatically but has been removed from HDFS by the JT, then you still get NULL returned from the JT, instead of an empty set of Counters.

        So, the real question for me, is this bad behavior by the JobTracker? Or should the Hadoop client library be the one to handle the NULL and either pass NULL along to user-code, or create an empty set of Counters? Whichever it is, it'd be very nice to have it documented in the Javadocs, including the old API as well as the new API.

        Show
        Aaron Baff added a comment - Yes, it's quite reproducible, at least for me. If a MR Job has been retired for around mapreduce.jobtracker.persist.jobstatus.hours hours (default 1 hour), then it is removed by the JobTracker from mapreduce.jobtracker.persist.jobstatus.dir (default /jobtracker/jobsInfo on HDFS). Once it's removed, the Counters are no longer available when fetching information about the Job. Going one step further, the JobTracker caches the last 1000 (default) MR Job info's in memory, yet even if it's still available in the cache and can be viewed through the Web UI, when you try and fetch the Counters programmatically but has been removed from HDFS by the JT, then you still get NULL returned from the JT, instead of an empty set of Counters. So, the real question for me, is this bad behavior by the JobTracker? Or should the Hadoop client library be the one to handle the NULL and either pass NULL along to user-code, or create an empty set of Counters? Whichever it is, it'd be very nice to have it documented in the Javadocs, including the old API as well as the new API.
        Hide
        Robert Joseph Evans added a comment -

        We saw this issue a little while ago through pig, but I had trouble reproducing it with pig. I am looking into it now, and wanted to know if you could reproduce it easily and if so how?

        Show
        Robert Joseph Evans added a comment - We saw this issue a little while ago through pig, but I had trouble reproducing it with pig. I am looking into it now, and wanted to know if you could reproduce it easily and if so how?
        Hide
        Aaron Baff added a comment -

        This appears to occur only when the Job has been retired, and the JobInfo data has been removed from HDFS (default dir is /jobtracker/jobInfo I believe). In that case, apparently the getJobCounters() call returns NULL, instead of a Counters object with no Counter's in it. Thus, in the Counters.downgrade() for the old MR API, it doesn't check for NULL, instead just uses the results in a for-each loop.

        Show
        Aaron Baff added a comment - This appears to occur only when the Job has been retired, and the JobInfo data has been removed from HDFS (default dir is /jobtracker/jobInfo I believe). In that case, apparently the getJobCounters() call returns NULL, instead of a Counters object with no Counter's in it. Thus, in the Counters.downgrade() for the old MR API, it doesn't check for NULL, instead just uses the results in a for-each loop.
        Hide
        Aaron Baff added a comment -

        Wireshark capture of RunningJob.getCounters() call.

        Show
        Aaron Baff added a comment - Wireshark capture of RunningJob.getCounters() call.

          People

          • Assignee:
            Robert Joseph Evans
            Reporter:
            Aaron Baff
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development