After looking at the TaskTracker logs, we found the problem is as follows:
One of the task attempts failed to launch jvm. Finally block of JvmRunner.runChild() calls kill(), which calls terminateTask() which also fails. Then it will sleep for configured duration (default, 5 seconds) and then calls killTask(). Then it removes the jvmid mapping from jvmIdToRunner map.
Meanwhile, there was a killTaskAction for the same attempt from TaskTracker. This call removes the jvmId mapping from jvmToRunningTask. Then, it sees that JvmRunner.kill() is already called and it goes ahead and releases slot.
As there are free slots, TaskTracker tries to launch a task and finds the JvmManager in inconsistent state, since the jvm is not yet removed from jvmIdToRunner map. When it tries to find the details through getDetails(), it gets NullPointerException since jvmToRunningTask does not have an entry for the same.
I think JvmRunner.kill() should not do a back call to JvmManager for removing jvmid mapping from jvmIdToRunner map. The removal should be done by the callers of kill(). i.e. killJvm(), stop() and reapJvm(). JvmRunner.runChild() already does from UpdateOnJvmExit(), in next method call after kill().