Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4180

AMLauncher does not retry on failures when talking to NM

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      We see issues with RM trying to launch a container while a NM is restarting and we get exceptions like NMNotReadyException. While YARN-3842 added retry for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing there intermittent errors to cause job failures. This can manifest during rolling restart of NMs.

        Attachments

        1. YARN-4180.001.patch
          9 kB
          Anubhav Dhoot
        2. YARN-4180.002.patch
          9 kB
          Anubhav Dhoot
        3. YARN-4180.002.patch
          9 kB
          Anubhav Dhoot
        4. YARN-4180.002.patch
          9 kB
          Karthik Kambatla
        5. YARN-4180-branch-2.7.2.txt
          10 kB
          Anubhav Dhoot

          Activity

            People

            • Assignee:
              adhoot Anubhav Dhoot
              Reporter:
              adhoot Anubhav Dhoot
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: