Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4334

Ability to avoid ResourceManager recovery if state store is "too old"

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • resourcemanager
    • None

    Description

      There are times when a ResourceManager has been down long enough that ApplicationMasters and potentially external client-side monitoring mechanisms have given up completely. If the ResourceManager starts back up and tries to recover we can get into situations where the RM launches new application attempts for the AMs that gave up, but then the client also launches another instance of the app because it assumed everything was dead.

      It would be nice if the RM could be optionally configured to avoid trying to recover if the state store was "too old." The RM would come up without any applications recovered, but we would avoid a double-submission situation.

      Attachments

        1. YARN-4334.2.patch
          44 kB
          Chang Li
        2. YARN-4334.3.patch
          45 kB
          Chang Li
        3. YARN-4334.4.2.patch
          45 kB
          Chang Li
        4. YARN-4334.4.patch
          45 kB
          Chang Li
        5. YARN-4334.patch
          39 kB
          Chang Li
        6. YARN-4334.wip.2.patch
          18 kB
          Chang Li
        7. YARN-4334.wip.3.patch
          29 kB
          Chang Li
        8. YARN-4334.wip.4.patch
          34 kB
          Chang Li
        9. YARN-4334.wip.patch
          10 kB
          Chang Li

        Activity

          People

            lichangleo Chang Li
            jlowe Jason Darrell Lowe
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated: