Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3272

Lost NMs fail to rejoin

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • 0.23.0
    • 0.23.1
    • mrv2
    • None

    Description

      Lost nodemanagers fail to join back.

      When the NM is lost, RM log reads

      INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:<host:port> Timed out after 600 secs
      INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Processing <host:port> of type EXPIRE
      INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Removed Node <host:port>
      INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: <host:port> Node Transitioned from RUNNING to LOST
      

      When the NM joins back, RM log reads

      INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node not found rebooting <host:port>
      

      Attachments

        Issue Links

          Activity

            People

              jeagles Jonathan Turner Eagles
              rramya Ramya Sunil
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: