Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-4595

JVM Reuse triggers RuntimeException("Invalid state")

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.19.0
    • 0.19.0
    • None
    • None
    • Reviewed

    Description

      A Reducer triggers the following exception:

      08/11/05 08:58:50 INFO mapred.JobClient: Task Id : attempt_200811040110_0230_r_000008_1, Status : FAILED
      java.lang.RuntimeException: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_200811040110_0230_r_000008_1 Number of active JVMs:2
      JVMId jvm_200811040110_0230_r_-735233075 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000012_0
      JVMId jvm_200811040110_0230_r_-1716942642 #Tasks ran: 0 Currently busy? true Currently running: attempt_200811040110_0230_r_000040_0
      at java.lang.Throwable.<init>(Throwable.java:67)
      at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:245)
      at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:113)
      at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:78)
      at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:410)

      Other clues:

      In the three reduce task attempts where this was observed, this was attempt _1. Attempt _0 had started and eventually switches to "SUCCEEDED." So I think this is happening only on speculatively-executed reduce task attempts. The reduce output (part-XXXXX) gets lost when this attempt fails, even though the other (earlier) attempt succeeded.

      Attachments

        1. 4595.patch
          7 kB
          Devaraj Das

        Activity

          People

            ddas Devaraj Das
            kimballa Aaron Kimball
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: