|
The stack trace for the exception is as follows:
2009-05-12 11:45:00,821 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 30300, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@1a4036f, false, false, true, 292) from x.x.x.x:x error: java.io.IOException: java.lang.NullPointerException The reason for the above exception is as follows:
I was able to replicate the problem as follows:
At this point, in the JT log, we can see the above exception. To fix this problem, we discussed two possible approaches.
Evaluating the two options, and also considering the rare case in which this could happen, we decided to take the conservative approach and favor consistency of state over utilization. Hence, the proposal is to do the second option. This patch fixes the NPE by implementing the proposal described in comments above. It checks for Null in the lookup for the job in MemoryMatcher, and returns nothing to the TT when such a state is determined. It also includes a test case for simulating this condition.
Both the unit test and manual tests were run with and without the patch to make sure the fix works. Results of test-patch:
[exec] +1 overall. As this patch touches only capacity scheduler tests, I ran all capacity scheduler tests and all passed. I just committed this to trunk and branch 0.20
Integrated in Hadoop-trunk #834 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/834/
. Fix a NullPointerException in capacity scheduler's memory based scheduling code when jobs get retired. Contributed by Hemanth Yamijala. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The particular method call in question is