Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4725 [Umbrella] Auto-­restart of containers
  3. YARN-3998

Add support in the NodeManager to re-launch containers

VotersStop watchingWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.9.0, 3.0.0-alpha1
    • None
    • None

    Description

      I'd like to add a field(retry-times) in ContainerLaunchContext. When AM launches containers, it could specify the value. Then NM will re-launch the container 'retry-times' times when it fails to run(e.g.exit code is not 0).

      It will save a lot of time. It avoids container localization. RM does not need to re-schedule the container. And local files in container's working directory will be left for re-use.(If container have downloaded some big files, it does not need to re-download them when running again.)

      We find it is useful in systems like Storm.

      Attachments

        1. YARN-3998.09.patch
          101 kB
          Jun Gong
        2. YARN-3998.08.patch
          102 kB
          Jun Gong
        3. YARN-3998.07.patch
          92 kB
          Jun Gong
        4. YARN-3998.06.patch
          72 kB
          Jun Gong
        5. YARN-3998.05.patch
          68 kB
          Jun Gong
        6. YARN-3998.04.patch
          68 kB
          Jun Gong
        7. YARN-3998.03.patch
          133 kB
          Jun Gong
        8. YARN-3998.02.patch
          60 kB
          Jun Gong
        9. YARN-3998.01.patch
          45 kB
          Jun Gong

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hex108 Jun Gong
            hex108 Jun Gong
            Votes:
            0 Vote for this issue
            Watchers:
            19 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment