Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-128 [Umbrella] RM Restart Phase 1: State storage and non-work-preserving recovery
  3. YARN-540

Race condition causing RM to potentially relaunch already unregistered AMs on RM restart

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.1.1-beta
    • resourcemanager
    • None
    • Reviewed

    Description

      When job succeeds and successfully call finishApplicationMaster, RM shutdown and restart-dispatcher is stopped before it can process REMOVE_APP event. The next time RM comes back, it will reload the existing state files even though the job is succeeded

      Attachments

        1. YARN-540.1.patch
          18 kB
          Jian He
        2. YARN-540.10.patch
          44 kB
          Jian He
        3. YARN-540.10.patch
          44 kB
          Jian He
        4. YARN-540.11.patch
          46 kB
          Jian He
        5. YARN-540.2.patch
          32 kB
          Jian He
        6. YARN-540.3.patch
          12 kB
          Jian He
        7. YARN-540.4.patch
          33 kB
          Jian He
        8. YARN-540.5.patch
          33 kB
          Jian He
        9. YARN-540.6.patch
          36 kB
          Jian He
        10. YARN-540.7.patch
          37 kB
          Jian He
        11. YARN-540.7.patch
          37 kB
          Jian He
        12. YARN-540.8.patch
          40 kB
          Jian He
        13. YARN-540.9.patch
          45 kB
          Jian He
        14. YARN-540.9.patch
          45 kB
          Jian He
        15. YARN-540.patch
          7 kB
          Jian He
        16. YARN-540.patch
          7 kB
          Jian He

        Issue Links

          Activity

            People

              jianhe Jian He
              jianhe Jian He
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: