Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5406

Improve logging around Task Tracker exiting with JVM manager inconsistent state

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1-win, 1.3.0
    • Fix Version/s: 1-win, 1.3.0
    • Component/s: tasktracker
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Looks like we are reaching JVM manager inconsistent state which cases TT to crash:

      2013-06-09 06:41:11,250 FATAL org.apache.hadoop.mapred.JvmManager: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_201306080400_104812_m_000001_0 Number of active JVMs:8
        JVMId jvm_201306080400_104517_m_1331138312 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104517_m_000001_0
        JVMId jvm_201306080400_104641_m_-1631395161 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104641_m_000000_0
        JVMId jvm_201306080400_104494_m_-1702464703 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104494_m_000000_0
        JVMId jvm_201306080400_104784_m_1407576088 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104784_m_000000_0
        JVMId jvm_201306080400_104530_m_186665365 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104530_m_000000_0
        JVMId jvm_201306080400_104589_m_-1080246077 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104589_m_000000_0
        JVMId jvm_201306080400_104674_m_830017814 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104674_m_000000_0
        JVMId jvm_201306080400_104719_m_-226910128 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104719_m_000000_0. Aborting. 
      2013-06-09 06:41:11,250 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: 
      

      Although this causes TT to crash, the frequency of the error is rare and the error itself is recoverable so the priority of the issue is not high.

      However, this does look like a bug in the JVM manager state machine. I'm guessing there is some race condition that we're hitting.

      (Logs attached)

        Attachments

        1. hadoop-tasktracker-RD00155D61582F-short.log
          2.95 MB
          Chelsey Chang
        2. MAPREDUCE-5406.branch-1-win.1.patch
          4 kB
          Chelsey Chang

          Activity

            People

            • Assignee:
              checha Chelsey Chang
              Reporter:
              checha Chelsey Chang

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment