Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-896 Roll up for long-lived services in YARN
  3. YARN-3480

Recovery may get very slow with lots of services with lots of app-attempts

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.6.0
    • 2.9.0, 3.0.0-alpha1
    • resourcemanager
    • None
    • Reviewed

    Description

      When RM HA is enabled and running containers are kept across attempts, apps are more likely to finish successfully with more retries(attempts), so it will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make RM recover process much slower. It might be better to set max attempts to be stored in RMStateStore.

      BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to a small value, retried attempts might be very large. So we need to delete some attempts stored in RMStateStore and RMStateStore.

      Attachments

        1. YARN-3480.01.patch
          37 kB
          Jun Gong
        2. YARN-3480.02.patch
          37 kB
          Jun Gong
        3. YARN-3480.03.patch
          43 kB
          Jun Gong
        4. YARN-3480.04.patch
          32 kB
          Jun Gong
        5. YARN-3480.05.patch
          29 kB
          Jun Gong
        6. YARN-3480.06.patch
          29 kB
          Jun Gong
        7. YARN-3480.07.patch
          33 kB
          Jun Gong
        8. YARN-3480.08.patch
          25 kB
          Jun Gong
        9. YARN-3480.09.patch
          25 kB
          Jun Gong
        10. YARN-3480.10.patch
          36 kB
          Jun Gong
        11. YARN-3480.11.patch
          36 kB
          Jun Gong
        12. YARN-3480.12.patch
          41 kB
          Jun Gong
        13. YARN-3480.13.patch
          36 kB
          Jun Gong
        14. YARN-3480.14.patch
          37 kB
          Jun Gong
        15. YARN-3480.15.patch
          37 kB
          Jun Gong

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hex108 Jun Gong
            hex108 Jun Gong
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment