Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
0.20.1
-
None
-
Reviewed
-
Fixed a race condition involving JvmRunner.kill() and KillTaskAction, which was leading to an NullPointerException causing a transient inconsistent state in JvmManager and failure of tasks.
Description
In an environment where many jobs are killed simultaneously, NPEs are observed in the TT/JT logs when a task fails. The situation is aggravated when the taskcontroller.cfg is not configured properly. Below is the exception obtained:
INFO org.apache.hadoop.mapred.TaskInProgress: Error from <attempt_ID>: java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:529) Caused by: java.lang.NullPointerException at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.getDetails(JvmManager.java:329) at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:315) at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:146) at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:109) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:502)
Attachments
Attachments
Issue Links
- relates to
-
MAPREDUCE-5260 Job failed because of JvmManager running into inconsistent state
- Closed