Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8362

Number of remaining retries are updated twice after a container failure in NM

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.2.0, 3.1.1
    • Component/s: None
    • Labels:
      None

      Description

      The shouldRetry(int errorCode) in ContainerImpl with YARN-5015 also updated some fields in retry context- remaining retries, restart times.

      This method is directly called from outside the ContainerImpl class as well- ContainerLaunch.setContainerCompletedStatus. This causes following problems:

      1. remainingRetries are updated more than once after a failure. if maxRetries = 1, then a retry will not be triggered because of multiple calls to shouldRetry(int errorCode).
      2. Writes to retryContext should be protected and called when the write lock is held.

        Attachments

        1. YARN-8362.002.patch
          8 kB
          Chandni Singh
        2. YARN-8362.001.patch
          8 kB
          Chandni Singh

          Activity

            People

            • Assignee:
              csingh Chandni Singh
              Reporter:
              csingh Chandni Singh
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: