Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8710

Service AM should set a finite limit on NM container max retries

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0
    • Component/s: yarn-native-services
    • Labels:
      None

      Description

      Container retries are currently set to a default of -1 in AbstractProviderService.buildContainerRetry. If this is not overridden via service spec with a finite value for yarn.service.container-failure.retry.max , this causes infinite NM reties for the container for ALWAYS/ON_FAILURE restart policy . Ideally it should try a finite number of time on the same NM and subsequently Service AM can retry on another node.

      We can set this to default value of 3.

        Attachments

        1. YARN-8710.1.patch
          1 kB
          Suma Shivaprasad
        2. YARN-8710.2.patch
          2 kB
          Suma Shivaprasad

          Activity

            People

            • Assignee:
              suma.shivaprasad Suma Shivaprasad
              Reporter:
              suma.shivaprasad Suma Shivaprasad
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: