Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8001

Newly created Yarn application ID lost after RM failover

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.3, 2.9.0
    • None
    • RM
    • None

    Description

      I’ve seen a problem in Hadoop 2.7.3 where the newly submitted yarn application was lost after a RM failover. It looks like when handling Application submission, RM does not write it to the state-store (We are using zookeeper based state store) immediately before it respond to the client. But later it failed over to another RM and all write call to the state store failed. The new RM recovers state from the state-store, and this app is lost. 

       

      The symptom is error message at client side claiming a previously submitted application ID does not exist:

      2018-02-22 14:54:50,258 [JobControl] WARN  org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider - Invocation returned exception on [rm1] : org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_1519310222933_0160' doesn't exist in RM. Please check that the job submission was successful.

       

      This is a timeline excerpted from the resource manager logs:

      2018-02-22 14:54:06.7685260    headnode1        Storing application with id application_1519310222933_0160

      2018-02-22 14:54:06.7685660    headnode1              application_1519310222933_0160 State change from NEW to NEW_SAVING

      2018-02-22 14:54:17.8924760    headnode1        Transitioning to standby state

      2018-02-22 14:54:30.3951160    headnode0        Transitioning to active state

      Attachments

        Activity

          People

            Unassigned Unassigned
            shanyu shanyu zhao
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: