Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-556 [Umbrella] RM Restart phase 2 - Work preserving restart
  3. YARN-1373

Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • resourcemanager
    • None

    Description

      Currently the RM moves recovered app attempts to the a terminal recovered state and starts a new attempt. Instead, it will have to transition the last attempt to a running state such that it can proceed as normal once the running attempt has resynced with the ApplicationMasterService (YARN-1365 and YARN-1366). If the RM had started the application container before dying then the AM would be up and trying to contact the RM. The RM may have had died before launching the container. For this case, the RM should wait for AM liveliness period and issue a kill container for the stored master container. It should transition this attempt to some RECOVER_ERROR state and proceed to start a new attempt.

      Attachments

        Issue Links

          Activity

            People

              ojoshi Omkar Vinit Joshi
              bikassaha Bikas Saha
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: