Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-556 [Umbrella] RM Restart phase 2 - Work preserving restart
  3. YARN-1372

Ensure all completed containers are reported to the AMs across RM restart

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.6.0
    • resourcemanager
    • None
    • Reviewed

    Description

      Currently the NM informs the RM about completed containers and then removes those containers from the RM notification list. The RM passes on that completed container information to the AM and the AM pulls this data. If the RM dies before the AM pulls this data then the AM may not be able to get this information again. To fix this, NM should maintain a separate list of such completed container notifications sent to the RM. After the AM has pulled the containers from the RM then the RM will inform the NM about it and the NM can remove the completed container from the new list. Upon re-register with the RM (after RM restart) the NM should send the entire list of completed containers to the RM along with any other containers that completed while the RM was dead. This ensures that the RM can inform the AM's about all completed containers. Some container completions may be reported more than once since the AM may have pulled the container but the RM may die before notifying the NM about the pull.

      Attachments

        1. YARN-1372.prelim2.patch
          56 kB
          Anubhav Dhoot
        2. YARN-1372.prelim.patch
          53 kB
          Anubhav Dhoot
        3. YARN-1372.010.patch
          96 kB
          Anubhav Dhoot
        4. YARN-1372.009.patch
          100 kB
          Anubhav Dhoot
        5. YARN-1372.009.patch
          93 kB
          Anubhav Dhoot
        6. YARN-1372.008.patch
          93 kB
          Anubhav Dhoot
        7. YARN-1372.007.patch
          93 kB
          Anubhav Dhoot
        8. YARN-1372.006.patch
          88 kB
          Anubhav Dhoot
        9. YARN-1372.005.patch
          78 kB
          Anubhav Dhoot
        10. YARN-1372.005.patch
          78 kB
          Anubhav Dhoot
        11. YARN-1372.004.patch
          77 kB
          Anubhav Dhoot
        12. YARN-1372.003.patch
          73 kB
          Anubhav Dhoot
        13. YARN-1372.002_RMHandlesCompletedApp.patch
          82 kB
          Anubhav Dhoot
        14. YARN-1372.002_RMHandlesCompletedApp.patch
          82 kB
          Anubhav Dhoot
        15. YARN-1372.002_NMHandlesCompletedApp.patch
          86 kB
          Anubhav Dhoot
        16. YARN-1372.001.patch
          70 kB
          Anubhav Dhoot
        17. YARN-1372.001.patch
          73 kB
          Anubhav Dhoot

        Issue Links

          Activity

            People

              adhoot Anubhav Dhoot
              bikassaha Bikas Saha
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: