Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3644

Node manager shuts down if unable to connect with RM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • nodemanager
    • None

    Description

      When NM is unable to connect to RM, NM shuts itself down.

                } catch (ConnectException e) {
                  //catch and throw the exception if tried MAX wait time to connect RM
                  dispatcher.getEventHandler().handle(
                      new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
                  throw new YarnRuntimeException(e);
      

      In large clusters, if RM is down for maintenance for longer period, all the NMs shuts themselves down, requiring additional work to bring up the NMs.

      Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side effects, where non connection failures are being retried infinitely by all YarnClients (via RMProxy).

      Attachments

        1. YARN-3644.patch
          10 kB
          Raju Bairishetti
        2. YARN-3644.003.patch
          10 kB
          Raju Bairishetti
        3. YARN-3644.002.patch
          9 kB
          Raju Bairishetti
        4. YARN-3644.001.patch
          9 kB
          Raju Bairishetti
        5. YARN-3644.001.patch
          9 kB
          Raju Bairishetti

        Issue Links

          Activity

            People

              raju.bairishetti Raju Bairishetti
              sriksun Srikanth Sundarrajan
              Votes:
              1 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

                Created:
                Updated: