XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.6.0
    • Component/s: nodemanager
    • Labels:
      None
    • Target Version/s:

      Description

      To support work-preserving NM restart we need to recover the state of the containers when the nodemanager went down. This includes informing the RM of containers that have exited in the interim and a strategy for dealing with the exit codes from those containers along with how to reacquire the active containers and determine their exit codes when they terminate. The state of finished containers also needs to be recovered.

        Attachments

        1. YARN-1337-v1.patch
          99 kB
          Jason Darrell Lowe
        2. YARN-1337-v2.patch
          112 kB
          Jason Darrell Lowe
        3. YARN-1337-v3.patch
          118 kB
          Jason Darrell Lowe

          Issue Links

            Activity

              People

              • Assignee:
                jlowe Jason Darrell Lowe
                Reporter:
                jlowe Jason Darrell Lowe
              • Votes:
                0 Vote for this issue
                Watchers:
                15 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: