Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4401

A failed app recovery should not prevent the RM from starting

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Won't Fix
    • Affects Version/s: 2.7.1
    • Fix Version/s: None
    • Component/s: resourcemanager
    • Labels:
      None

      Description

      There are many different reasons why an app recovery could fail with an exception, causing the RM start to be aborted. If that happens the RM will fail to start. Presumably, the reason the RM is trying to do a recovery is that it's the standby trying to fill in for the active. Failing to come up defeats the purpose of the HA configuration. Instead of preventing the RM from starting, a failed app recovery should log an error and skip the application.

        Attachments

        1. YARN-4401.001.patch
          2 kB
          Daniel Templeton

          Issue Links

            Activity

              People

              • Assignee:
                templedf Daniel Templeton
                Reporter:
                templedf Daniel Templeton
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: