Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3644

Node manager shuts down if unable to connect with RM

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: nodemanager
    • Labels:
      None

      Description

      When NM is unable to connect to RM, NM shuts itself down.

                } catch (ConnectException e) {
                  //catch and throw the exception if tried MAX wait time to connect RM
                  dispatcher.getEventHandler().handle(
                      new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
                  throw new YarnRuntimeException(e);
      

      In large clusters, if RM is down for maintenance for longer period, all the NMs shuts themselves down, requiring additional work to bring up the NMs.

      Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side effects, where non connection failures are being retried infinitely by all YarnClients (via RMProxy).

        Attachments

        1. YARN-3644.003.patch
          10 kB
          Raju Bairishetti
        2. YARN-3644.002.patch
          9 kB
          Raju Bairishetti
        3. YARN-3644.001.patch
          9 kB
          Raju Bairishetti
        4. YARN-3644.001.patch
          9 kB
          Raju Bairishetti
        5. YARN-3644.patch
          10 kB
          Raju Bairishetti

          Issue Links

            Activity

              People

              • Assignee:
                raju.bairishetti Raju Bairishetti
                Reporter:
                sriksun Srikanth Sundarrajan
              • Votes:
                1 Vote for this issue
                Watchers:
                21 Start watching this issue

                Dates

                • Created:
                  Updated: