Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8710

Service AM should set a finite limit on NM container max retries

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.3.0
    • yarn-native-services
    • None

    Description

      Container retries are currently set to a default of -1 in AbstractProviderService.buildContainerRetry. If this is not overridden via service spec with a finite value for yarn.service.container-failure.retry.max , this causes infinite NM reties for the container for ALWAYS/ON_FAILURE restart policy . Ideally it should try a finite number of time on the same NM and subsequently Service AM can retry on another node.

      We can set this to default value of 3.

      Attachments

        1. YARN-8710.2.patch
          2 kB
          Suma Shivaprasad
        2. YARN-8710.1.patch
          1 kB
          Suma Shivaprasad

        Activity

          People

            suma.shivaprasad Suma Shivaprasad
            suma.shivaprasad Suma Shivaprasad
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: