Hadoop YARN
  1. Hadoop YARN
  2. YARN-542

Change the default global AM max-attempts value to be not one

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1.0-beta
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Today, the global AM max-attempts is set to 1 which is a bad choice. AM max-attempts accounts for both AM level failures as well as container crashes due to localization issue, lost nodes etc. To account for AM crashes due to problems that are not caused by user code, mainly lost nodes, we want to give AMs some retires.

      I propose we change it to atleast two. Can change it to 4 to match other retry-configs.

        Issue Links

          Activity

          Vinod Kumar Vavilapalli created issue -
          Vinod Kumar Vavilapalli made changes -
          Field Original Value New Value
          Assignee Vinod Kumar Vavilapalli [ vinodkv ]
          Zhijie Shen made changes -
          Assignee Vinod Kumar Vavilapalli [ vinodkv ] Zhijie Shen [ zjshen ]
          Zhijie Shen made changes -
          Link This issue is blocked by YARN-378 [ YARN-378 ]
          Zhijie Shen made changes -
          Link This issue is related to MAPREDUCE-5145 [ MAPREDUCE-5145 ]
          Zhijie Shen made changes -
          Summary Change the default AM retry value to be not one Change the default global AM max-attempts value to be not one
          Zhijie Shen made changes -
          Description Today, the AM max-retries is set to 1 which is a bad choice. AM max-retries accounts for both AM level failures as well as container crashes due to localization issue, lost nodes etc. To account for AM crashes due to problems that are not caused by user code, mainly lost nodes, we want to give AMs some retires.

          I propose we change it to atleast two. Can change it to 4 to match other retry-configs.
          Today, the global AM max-attempts is set to 1 which is a bad choice. AM max-attempts accounts for both AM level failures as well as container crashes due to localization issue, lost nodes etc. To account for AM crashes due to problems that are not caused by user code, mainly lost nodes, we want to give AMs some retires.

          I propose we change it to atleast two. Can change it to 4 to match other retry-configs.
          Zhijie Shen made changes -
          Link This issue is blocked by YARN-378 [ YARN-378 ]
          Zhijie Shen made changes -
          Link This issue relates to YARN-378 [ YARN-378 ]
          Zhijie Shen made changes -
          Attachment YARN-542.1.patch [ 12578289 ]
          Zhijie Shen made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Vinod Kumar Vavilapalli made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 2.0.5-beta [ 12324029 ]
          Resolution Fixed [ 1 ]
          Bikas Saha made changes -
          Link This issue relates to YARN-614 [ YARN-614 ]
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Zhijie Shen
              Reporter:
              Vinod Kumar Vavilapalli
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development