Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3607

Allow users to choose between failing the daemons vs failing the apps/containers

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.0
    • None
    • None

    Description

      We often run into cases where we are faced with the option of failing the daemon (fail-fast) vs failing user's work and keep the cluster running. There is no clear right way to handle these situations - some users would like to be conservative and let the daemons run, while others would like to fail-fast.

      Today, we handle these case-by-case and go by what the people working on it feel is the right way to handle things. Examples include how we handle app recovery failures, queue-changes on RM restart.

      Users should be able to choose between these two extremes, and have all these situations handled the same way.

      Attachments

        Issue Links

          Activity

            People

              rchiang Ray Chiang
              kasha Karthik Kambatla
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: