Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3644

Node manager shuts down if unable to connect with RM

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • nodemanager
    • None

    Description

      When NM is unable to connect to RM, NM shuts itself down.

                } catch (ConnectException e) {
                  //catch and throw the exception if tried MAX wait time to connect RM
                  dispatcher.getEventHandler().handle(
                      new NodeManagerEvent(NodeManagerEventType.SHUTDOWN));
                  throw new YarnRuntimeException(e);
      

      In large clusters, if RM is down for maintenance for longer period, all the NMs shuts themselves down, requiring additional work to bring up the NMs.

      Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side effects, where non connection failures are being retried infinitely by all YarnClients (via RMProxy).

      Attachments

        1. YARN-3644.patch
          10 kB
          Raju Bairishetti
        2. YARN-3644.001.patch
          9 kB
          Raju Bairishetti
        3. YARN-3644.001.patch
          9 kB
          Raju Bairishetti
        4. YARN-3644.002.patch
          9 kB
          Raju Bairishetti
        5. YARN-3644.003.patch
          10 kB
          Raju Bairishetti

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            raju.bairishetti Raju Bairishetti
            sriksun Srikanth Sundarrajan

            Dates

              Created:
              Updated:

              Slack

                Issue deployment