Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4321

Incessant retries if NoAuthException is thrown by Zookeeper in non HA mode

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.7.1
    • 2.7.2
    • resourcemanager
    • None
    • Reviewed

    Description

      This applies to only branch-2.7 or earlier code.
      When a NoAuthException is thrown in non HA mode(like in the scenario of YARN-4127), RM incessantly keeps on retrying the ZK operation.

      2015-10-23 09:22:10,209 DEBUG [SyncThread:0] server.DataTree (DataTree.java:processTxn(949)) - Ignoring processTxn failure hdr: -1 : error: -102
      2015-10-23 09:22:10,210 DEBUG [main-SendThread(127.0.0.1:11221)] zookeeper.ClientCnxn (ClientCnxn.java:readResponse(818)) - Reading reply sessionid:0x15092d1ebe10001, packet:: clientPath:null serverPath:null finished:false header:: 7591,1  replyHeader:: 7591,7610,-102  request:: '/rmstore/ZKRMStateRoot/RMAppRoot,,v{s{31,s{'world,'anyone}}},0  response::
      2015-10-23 09:22:10,210 INFO  [ProcessThread(sid:0 cport:-1):] server.PrepRequestProcessor (PrepRequestProcessor.java:pRequest(645)) - Got user-level KeeperException when processing sessionid:0x15092d1ebe10001 type:create cxid:0x1da8 zxid:0x1dbb txntype:-1 reqpath:n/a Error Path:null Error:KeeperErrorCode = NoAuth
      

      This is because we do not handle NoAuthException properly in branch-2.7 code when HA is not enabled.
      In ZKRMStateStore#runWithRetries, we have code as under. As can be seen if HA is not enabled, we neither rethrow NoAuthException nor do we have any logic to increment retries and back out if retries are maxed out.

       T runWithRetries() throws Exception {
            int retry = 0;
            while (true) {
              try {
                return runWithCheck();
              } catch (KeeperException.NoAuthException nae) {
                if (HAUtil.isHAEnabled(getConfig())) {
                  // NoAuthException possibly means that this store is fenced due to
                  // another RM becoming active. Even if not,
                  // it is safer to assume we have been fenced
                  throw new StoreFencedException();
                }
              } catch (KeeperException ke) {
                .............
             }
           }
        }
      

      Attachments

        1. YARN-4321-branch-2.7.01.patch
          6 kB
          Varun Saxena

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            varun_saxena Varun Saxena
            varun_saxena Varun Saxena
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment