Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5260

Job failed because of JvmManager running into inconsistent state

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.2
    • Fix Version/s: 1.2.1
    • Component/s: tasktracker
    • Labels:
      None

      Description

      In our cluster, jobs failed due to randomly task initialization failed because of JvmManager running into inconsistent state and TaskTracker failed to exit:

      java.lang.Throwable: Child Error
      at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
      Caused by: java.lang.NullPointerException
      at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.getDetails(JvmManager.java:402)
      at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:387)
      at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:192)
      at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:125)
      at org.apache.hadoop.mapred.TaskRunner.launchJvmAndWait(TaskRunner.java:292)
      at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:251)

      -------
      java.lang.Throwable: Child Error
      at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
      Caused by: java.lang.NullPointerException
      at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.getDetails(JvmManager.java:402)
      at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:387)
      at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:192)
      at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:125)
      at org.apache.hadoop.mapred.TaskRunner.launchJvmAndWait(TaskRunner.java:292)
      at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:251)

        Issue Links

          Activity

          Hide
          Arun C Murthy added a comment -

          I just committed this. Thanks zhaoyunjiong!

          Show
          Arun C Murthy added a comment - I just committed this. Thanks zhaoyunjiong!
          Hide
          Benoy Antony added a comment -

          reviewed. +1

          Show
          Benoy Antony added a comment - reviewed. +1
          Hide
          zhaoyunjiong added a comment -

          Update patch name for branch 1.1.

          Show
          zhaoyunjiong added a comment - Update patch name for branch 1.1.
          Hide
          zhaoyunjiong added a comment -

          The root cause of JvmManager running into inconsistent state is TaskTracker lack of user information:
          2013-05-14 07:01:31,482 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201305100625_20199_m_000431_0
          2013-05-14 07:01:31,485 INFO org.apache.hadoop.mapred.TaskController: Reading task controller config from /etc/hadoop/taskcontroller.cfg
          2013-05-14 07:01:31,485 INFO org.apache.hadoop.mapred.TaskController: User zhaoyunjiong not found
          2013-05-14 07:01:31,485 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: java.io.IOException: Problem signalling task 30048 with TERM; exit = 255
          at org.apache.hadoop.mapred.LinuxTaskController.signalTask(LinuxTaskController.java:319)
          at org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.kill(JvmManager.java:555)
          at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvmRunner(JvmManager.java:317)
          at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvm(JvmManager.java:297)
          at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.taskKilled(JvmManager.java:289)
          at org.apache.hadoop.mapred.JvmManager.taskKilled(JvmManager.java:158)
          at org.apache.hadoop.mapred.TaskRunner.kill(TaskRunner.java:801)
          at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:3279)
          at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.jobHasFinished(TaskTracker.java:3251)
          at org.apache.hadoop.mapred.TaskTracker.purgeTask(TaskTracker.java:2286)
          at org.apache.hadoop.mapred.TaskTracker.markUnresponsiveTasks(TaskTracker.java:2185)
          at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1862)
          at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2646)
          at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3900)

          This patch catch IOException throwed by LinuxTaskController to prevent inconsistent state.
          Also it make sure TT will shutdown itself when running into inconsistent state.

          Show
          zhaoyunjiong added a comment - The root cause of JvmManager running into inconsistent state is TaskTracker lack of user information: 2013-05-14 07:01:31,482 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201305100625_20199_m_000431_0 2013-05-14 07:01:31,485 INFO org.apache.hadoop.mapred.TaskController: Reading task controller config from /etc/hadoop/taskcontroller.cfg 2013-05-14 07:01:31,485 INFO org.apache.hadoop.mapred.TaskController: User zhaoyunjiong not found 2013-05-14 07:01:31,485 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: java.io.IOException: Problem signalling task 30048 with TERM; exit = 255 at org.apache.hadoop.mapred.LinuxTaskController.signalTask(LinuxTaskController.java:319) at org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.kill(JvmManager.java:555) at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvmRunner(JvmManager.java:317) at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvm(JvmManager.java:297) at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.taskKilled(JvmManager.java:289) at org.apache.hadoop.mapred.JvmManager.taskKilled(JvmManager.java:158) at org.apache.hadoop.mapred.TaskRunner.kill(TaskRunner.java:801) at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:3279) at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.jobHasFinished(TaskTracker.java:3251) at org.apache.hadoop.mapred.TaskTracker.purgeTask(TaskTracker.java:2286) at org.apache.hadoop.mapred.TaskTracker.markUnresponsiveTasks(TaskTracker.java:2185) at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1862) at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2646) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3900) This patch catch IOException throwed by LinuxTaskController to prevent inconsistent state. Also it make sure TT will shutdown itself when running into inconsistent state.

            People

            • Assignee:
              zhaoyunjiong
              Reporter:
              zhaoyunjiong
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development