Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5406

Improve logging around Task Tracker exiting with JVM manager inconsistent state

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1-win, 1.3.0
    • 1-win, 1.3.0
    • tasktracker
    • None
    • Reviewed

    Description

      Looks like we are reaching JVM manager inconsistent state which cases TT to crash:

      2013-06-09 06:41:11,250 FATAL org.apache.hadoop.mapred.JvmManager: Inconsistent state!!! JVM Manager reached an unstable state while reaping a JVM for task: attempt_201306080400_104812_m_000001_0 Number of active JVMs:8
        JVMId jvm_201306080400_104517_m_1331138312 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104517_m_000001_0
        JVMId jvm_201306080400_104641_m_-1631395161 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104641_m_000000_0
        JVMId jvm_201306080400_104494_m_-1702464703 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104494_m_000000_0
        JVMId jvm_201306080400_104784_m_1407576088 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104784_m_000000_0
        JVMId jvm_201306080400_104530_m_186665365 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104530_m_000000_0
        JVMId jvm_201306080400_104589_m_-1080246077 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104589_m_000000_0
        JVMId jvm_201306080400_104674_m_830017814 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104674_m_000000_0
        JVMId jvm_201306080400_104719_m_-226910128 #Tasks ran: 0 Currently busy? true Currently running: attempt_201306080400_104719_m_000000_0. Aborting. 
      2013-06-09 06:41:11,250 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: 
      

      Although this causes TT to crash, the frequency of the error is rare and the error itself is recoverable so the priority of the issue is not high.

      However, this does look like a bug in the JVM manager state machine. I'm guessing there is some race condition that we're hitting.

      (Logs attached)

      Attachments

        1. hadoop-tasktracker-RD00155D61582F-short.log
          2.95 MB
          Chelsey Chang
        2. MAPREDUCE-5406.branch-1-win.1.patch
          4 kB
          Chelsey Chang

        Activity

          People

            checha Chelsey Chang
            checha Chelsey Chang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: