Hadoop Common
  1. Hadoop Common
  2. HADOOP-5247

NPEs in JobTracker and JobClient when mapred.jobtracker.completeuserjobs.maximum is set to zero.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.2, 0.20.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    1. HADOOP-5247-v1.1.patch
      23 kB
      Amar Kamat
    2. HADOOP-5247-v1.10.branch-0.19.patch
      7 kB
      Devaraj Das
    3. HADOOP-5247-v1.10.patch
      7 kB
      Amar Kamat
    4. HADOOP-5247-v1.4.patch
      9 kB
      Amar Kamat
    5. HADOOP-5247-v1.7.patch
      10 kB
      Amar Kamat
    6. HADOOP-5247-v1.8.patch
      24 kB
      Amar Kamat
    7. HADOOP-5247-v1.9.patch
      10 kB
      Amar Kamat

      Issue Links

        Activity

        Hide
        Vinod Kumar Vavilapalli added a comment -

        JobTracker: These exceptions occurred in JT and affected heartbeats. After a while, JT was doing very less useful work and mostly was returning exceptions in heartbeats and otherwise, of particular TTs running the jobs that were removed from memory.

        2009-02-08 02:15:19,743 INFO org.apache.hadoop.ipc.Server: IPC Server handler 19 on 57426, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@96136c, false, false, true, 9912) from <host:port>: error: java.io.I
        OException: java.lang.NullPointerException
        java.io.IOException: java.lang.NullPointerException
                at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:3103)
                at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:2484)
                at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2285)
                at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:396)
                at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
        
        2009-02-09 00:00:33,679 INFO org.apache.hadoop.ipc.Server: IPC Server handler 17 on 57426, call getTaskDiagnostics(attempt_200902061402_3468_m_000625_0) from <host:port>: error: java.io.IOException: java.lang.IllegalArgumentException: Job job_200902061402_3468 not found.
        java.io.IOException: java.lang.IllegalArgumentException: Job job_200902061402_3468 not found.
                at org.apache.hadoop.mapred.JobTracker.getTaskDiagnostics(JobTracker.java:2960)
                at sun.reflect.GeneratedMethodAccessor164.invoke(Unknown Source)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:396)
                at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
        
        Show
        Vinod Kumar Vavilapalli added a comment - JobTracker: These exceptions occurred in JT and affected heartbeats. After a while, JT was doing very less useful work and mostly was returning exceptions in heartbeats and otherwise, of particular TTs running the jobs that were removed from memory. 2009-02-08 02:15:19,743 INFO org.apache.hadoop.ipc.Server: IPC Server handler 19 on 57426, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@96136c, false , false , true , 9912) from <host:port>: error: java.io.I OException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:3103) at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:2484) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2285) at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) 2009-02-09 00:00:33,679 INFO org.apache.hadoop.ipc.Server: IPC Server handler 17 on 57426, call getTaskDiagnostics(attempt_200902061402_3468_m_000625_0) from <host:port>: error: java.io.IOException: java.lang.IllegalArgumentException: Job job_200902061402_3468 not found. java.io.IOException: java.lang.IllegalArgumentException: Job job_200902061402_3468 not found. at org.apache.hadoop.mapred.JobTracker.getTaskDiagnostics(JobTracker.java:2960) at sun.reflect.GeneratedMethodAccessor164.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
        Hide
        Vinod Kumar Vavilapalli added a comment -

        One more on JT

        2009-02-11 06:01:56,813 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 58069, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@1d388f3, false, false, true, 9333) from <host:port>: error: java.io.IOException: ja
        va.lang.NullPointerException
        java.io.IOException: java.lang.NullPointerException
                at org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2541)
                at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2324)
                at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
                at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:396)
                at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
        
        Show
        Vinod Kumar Vavilapalli added a comment - One more on JT 2009-02-11 06:01:56,813 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 58069, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@1d388f3, false , false , true , 9333) from <host:port>: error: java.io.IOException: ja va.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.mapred.JobTracker.getTasksToSave(JobTracker.java:2541) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2324) at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
        Hide
        Vinod Kumar Vavilapalli added a comment -

        On the JobClient:

        java.lang.NullPointerException
                at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1284)
                at org.apache.hadoop.mapred.GenericMRLoadGenerator.run(GenericMRLoadGenerator.java:174)
                at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
                at org.apache.hadoop.mapred.GenericMRLoadGenerator.main(GenericMRLoadGenerator.java:188)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
                at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
                at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:81)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
        
        java.lang.NullPointerException
                at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1279)
                at org.apache.hadoop.mapred.GenericMRLoadGenerator.run(GenericMRLoadGenerator.java:174)
                at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
                at org.apache.hadoop.mapred.GenericMRLoadGenerator.main(GenericMRLoadGenerator.java:188)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
                at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
                at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:81)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
        
        java.lang.NullPointerException
                at org.apache.hadoop.mapred.JobClient$NetworkedJob.isComplete(JobClient.java:287)
                at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1279)
                at org.apache.hadoop.mapred.GenericMRLoadGenerator.run(GenericMRLoadGenerator.java:174)
                at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
                at org.apache.hadoop.mapred.GenericMRLoadGenerator.main(GenericMRLoadGenerator.java:188)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
                at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
                at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:81)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
        
        java.lang.NullPointerException
                at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1359)
                at org.apache.hadoop.examples.Grep.run(Grep.java:69)
                at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
                at org.apache.hadoop.examples.Grep.main(Grep.java:93)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
                at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
                at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
        
        java.lang.NullPointerException
                at org.apache.hadoop.mapred.JobClient$NetworkedJob.isSuccessful(JobClient.java:297)
                at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1355)
                at org.apache.hadoop.mapred.GenericMRLoadGenerator.run(GenericMRLoadGenerator.java:174)
                at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
                at org.apache.hadoop.mapred.GenericMRLoadGenerator.main(GenericMRLoadGenerator.java:188)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
                at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
                at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:81)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
        
        Show
        Vinod Kumar Vavilapalli added a comment - On the JobClient: java.lang.NullPointerException at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1284) at org.apache.hadoop.mapred.GenericMRLoadGenerator.run(GenericMRLoadGenerator.java:174) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.mapred.GenericMRLoadGenerator.main(GenericMRLoadGenerator.java:188) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:81) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) java.lang.NullPointerException at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1279) at org.apache.hadoop.mapred.GenericMRLoadGenerator.run(GenericMRLoadGenerator.java:174) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.mapred.GenericMRLoadGenerator.main(GenericMRLoadGenerator.java:188) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:81) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) java.lang.NullPointerException at org.apache.hadoop.mapred.JobClient$NetworkedJob.isComplete(JobClient.java:287) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1279) at org.apache.hadoop.mapred.GenericMRLoadGenerator.run(GenericMRLoadGenerator.java:174) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.mapred.GenericMRLoadGenerator.main(GenericMRLoadGenerator.java:188) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:81) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) java.lang.NullPointerException at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1359) at org.apache.hadoop.examples.Grep.run(Grep.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.Grep.main(Grep.java:93) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) java.lang.NullPointerException at org.apache.hadoop.mapred.JobClient$NetworkedJob.isSuccessful(JobClient.java:297) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1355) at org.apache.hadoop.mapred.GenericMRLoadGenerator.run(GenericMRLoadGenerator.java:174) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.mapred.GenericMRLoadGenerator.main(GenericMRLoadGenerator.java:188) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:81) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
        Hide
        Amar Kamat added a comment -

        As of today JobTracker doesn't support 0 completed jobs in memory configuration. HADOOP-5083 will remove this configuration completely for trunk. Will provide a patch for 0.20 and trunk.

        Show
        Amar Kamat added a comment - As of today JobTracker doesn't support 0 completed jobs in memory configuration. HADOOP-5083 will remove this configuration completely for trunk. Will provide a patch for 0.20 and trunk.
        Hide
        Hemanth Yamijala added a comment -

        While it is true that the 0 as the number of completed jobs is not supported, the point of this JIRA is that there is a condition where if jobs are swapped out, and some tasktracker or client comes in asking for that completed job, there seem to be exceptions.

        So, while setting to 0 might aggravate the problem largely, it is still an issue IMO.

        Is my understanding correct, Amar ?

        Show
        Hemanth Yamijala added a comment - While it is true that the 0 as the number of completed jobs is not supported, the point of this JIRA is that there is a condition where if jobs are swapped out, and some tasktracker or client comes in asking for that completed job, there seem to be exceptions. So, while setting to 0 might aggravate the problem largely, it is still an issue IMO. Is my understanding correct, Amar ?
        Hide
        Amar Kamat added a comment -

        Hemanth, you are right. When a task tracker or a jobclient contacts the jobtracker for updates, progress or info, its assumed that the job is present in the memory, which might not always be true.

        Show
        Amar Kamat added a comment - Hemanth, you are right. When a task tracker or a jobclient contacts the jobtracker for updates, progress or info, its assumed that the job is present in the memory, which might not always be true.
        Hide
        Hemanth Yamijala added a comment -

        Thanks, Amar. Based on this explanation, I am marking this a blocker.

        Show
        Hemanth Yamijala added a comment - Thanks, Amar. Based on this explanation, I am marking this a blocker.
        Hide
        Amar Kamat added a comment -

        Attaching a patch the should fix these exceptions. This patch allows mapred.jobtracker.completeuserjobs.maximum to be 0. Also provided a testcase to test 'mapred.jobtracker.completeuserjobs.maximum'.

        Show
        Amar Kamat added a comment - Attaching a patch the should fix these exceptions. This patch allows mapred.jobtracker.completeuserjobs.maximum to be 0. Also provided a testcase to test 'mapred.jobtracker.completeuserjobs.maximum'.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Looked at the patch.

        One thing. The problem with the client-side exceptions are not the exceptions themselves, but the lack of information given by them. So we should continue to throw exceptions on the client but with appropriate error message. For e.g. this code is already in place. (JobClient.java +1283, runJob())

                  if (running == null) {
                    throw new IOException("Unable to fetch job status from server.");
                  }
          

        We should have similar code inside NetworkedJob.isComplete() and anywhere else JobStatus can become null. Also the above mentioned code can be moved to the beginning of the try catch block so that "running" job itself doesn't become null.

        Other code comments:

        • trackerToJobsToCleanup doesn't need to be a TreeMap, a HashMap should do.
        • addJobToKill and getJobsToKill don't need to be locked on JT. I think we should protect all the accesses of trackerToJobsToCleanup by a lock on itself and not on JobTracker.
        • In getJobsToKill, you can combine trackerToJobsToCleanup.get(taskTracker) and trackerToJobsToCleanup.remove(taskTracker) into a single call Set<JobID> jobs = trackerToJobsToCleanup.remove(taskTracker)
        • The code change for null job in updateTaskStatuses() can be moved further up above the lookup in taskidToTIPMap.
        • In the testcase, IntSumReducer is duplicated and can be reused from org.apache.hadoop.examples.WordCount as examples are on the classpath of the junit tests
        Show
        Vinod Kumar Vavilapalli added a comment - Looked at the patch. One thing. The problem with the client-side exceptions are not the exceptions themselves, but the lack of information given by them. So we should continue to throw exceptions on the client but with appropriate error message. For e.g. this code is already in place. (JobClient.java +1283, runJob()) if (running == null ) { throw new IOException( "Unable to fetch job status from server." ); } We should have similar code inside NetworkedJob.isComplete() and anywhere else JobStatus can become null. Also the above mentioned code can be moved to the beginning of the try catch block so that "running" job itself doesn't become null. Other code comments: trackerToJobsToCleanup doesn't need to be a TreeMap, a HashMap should do. addJobToKill and getJobsToKill don't need to be locked on JT. I think we should protect all the accesses of trackerToJobsToCleanup by a lock on itself and not on JobTracker. In getJobsToKill, you can combine trackerToJobsToCleanup.get(taskTracker) and trackerToJobsToCleanup.remove(taskTracker) into a single call Set<JobID> jobs = trackerToJobsToCleanup.remove(taskTracker) The code change for null job in updateTaskStatuses() can be moved further up above the lookup in taskidToTIPMap. In the testcase, IntSumReducer is duplicated and can be reused from org.apache.hadoop.examples.WordCount as examples are on the classpath of the junit tests
        Hide
        Amar Kamat added a comment -

        Attaching a patch that does the following

        • Sends "Kill-Job-Action" to all the trackers upon a job completion.
        • RunningJob doesnt do any update if the job is completed. If RunningJob tries to get an update for a job which is missing with the JobTracker then an exception is raised to inform that the job no longer exists.
        • trackerToJobsToCleanup is made a HashMap
        • Locking is now fined grained. May be we should do the same for other methods like getTasksToKill(). I think the reason its done this way is to keep it simple as there is no need to make these calls thread-safe.
        • getJobsToKill now does Set<JobID> jobs = trackerToJobsToCleanup.remove(taskTracker)
        • updateTaskStatuses() : Here its important to remove the attempt from the expiry queue irrespective of the job and hence the first step is to remove the expiring task.
        • The testcase is removed as the test has nothing to do with this issue
        Show
        Amar Kamat added a comment - Attaching a patch that does the following Sends "Kill-Job-Action" to all the trackers upon a job completion. RunningJob doesnt do any update if the job is completed. If RunningJob tries to get an update for a job which is missing with the JobTracker then an exception is raised to inform that the job no longer exists. trackerToJobsToCleanup is made a HashMap Locking is now fined grained. May be we should do the same for other methods like getTasksToKill(). I think the reason its done this way is to keep it simple as there is no need to make these calls thread-safe. getJobsToKill now does Set<JobID> jobs = trackerToJobsToCleanup.remove(taskTracker) updateTaskStatuses() : Here its important to remove the attempt from the expiry queue irrespective of the job and hence the first step is to remove the expiring task. The testcase is removed as the test has nothing to do with this issue
        Hide
        Amar Kamat added a comment -

        Fixing the synchronization of trackerToJobsToCleanup in lostTaskTracker() and in getJobsForCleanup().

        Show
        Amar Kamat added a comment - Fixing the synchronization of trackerToJobsToCleanup in lostTaskTracker() and in getJobsForCleanup() .
        Hide
        Amar Kamat added a comment -

        Attaching a patch incoporating Vinod's offline comments. Result of test-case

        [exec] +1 overall.  
             [exec] 
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec] 
             [exec]     +1 tests included.  The patch appears to include 11 new or modified tests.
             [exec] 
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec] 
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec] 
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec] 
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec] 
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        
        Show
        Amar Kamat added a comment - Attaching a patch incoporating Vinod's offline comments. Result of test-case [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 11 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch looks good.
        Looks like you forgot to remove the testcases.

        Show
        Amareshwari Sriramadasu added a comment - Patch looks good. Looks like you forgot to remove the testcases.
        Hide
        Amar Kamat added a comment -

        Attaching a patch the removes the testcase. Fixes a minor nit.

        Show
        Amar Kamat added a comment - Attaching a patch the removes the testcase. Fixes a minor nit.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Patch looks good to me too. +1.

        Show
        Vinod Kumar Vavilapalli added a comment - Patch looks good to me too. +1.
        Hide
        Devaraj Das added a comment -

        Ok the one thing that needs to be changed I think is the changes to the JobClient side of things. The JobClient changes is more of a feature change and should be done in a separate jira. Please submit a new one with the jobclient changes removed.

        Show
        Devaraj Das added a comment - Ok the one thing that needs to be changed I think is the changes to the JobClient side of things. The JobClient changes is more of a feature change and should be done in a separate jira. Please submit a new one with the jobclient changes removed.
        Hide
        Amar Kamat added a comment -

        Attaching a patch the removes all the changes made for JobClient.

        Show
        Amar Kamat added a comment - Attaching a patch the removes all the changes made for JobClient .
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Amar! (After 0.19.1 is released for which voting is going on, we should commit this to 0.19 branch as well)

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Amar! (After 0.19.1 is released for which voting is going on, we should commit this to 0.19 branch as well)
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #763 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/763/ )
        Hide
        Devaraj Das added a comment -

        Attaching the patch for 0.19

        Show
        Devaraj Das added a comment - Attaching the patch for 0.19
        Hide
        Devaraj Das added a comment -

        I committed this to the 0.19 branch.

        Show
        Devaraj Das added a comment - I committed this to the 0.19 branch.
        Hide
        Charles Butterfield added a comment -

        I'm running 0.20.2 and have just gotten an NPE calling NetworkedJob.isComplete(), while the DFS is under heavy stress (multiple node removed from service).
        This seems to be exactly the case that Vinod identified last year. I could not find any open ticket. Should this ticket be reopened? Please advice.

        java.lang.NullPointerException
                at org.apache.hadoop.mapred.JobClient$NetworkedJob.isComplete(JobClient.java:287)
        
        Show
        Charles Butterfield added a comment - I'm running 0.20.2 and have just gotten an NPE calling NetworkedJob.isComplete(), while the DFS is under heavy stress (multiple node removed from service). This seems to be exactly the case that Vinod identified last year. I could not find any open ticket. Should this ticket be reopened? Please advice. java.lang.NullPointerException at org.apache.hadoop.mapred.JobClient$NetworkedJob.isComplete(JobClient.java:287)
        Hide
        Amareshwari Sriramadasu added a comment -

        In branch 0.21, MAPREDUCE-777 changed JobClient to return IOException instead of NullPointerException in the above scenario.

        Show
        Amareshwari Sriramadasu added a comment - In branch 0.21, MAPREDUCE-777 changed JobClient to return IOException instead of NullPointerException in the above scenario.

          People

          • Assignee:
            Amar Kamat
            Reporter:
            Vinod Kumar Vavilapalli
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development