Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7962

Race Condition When Stopping DelegationTokenRenewer causes RM crash during failover

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 3.0.0
    • 3.2.0, 3.1.1
    • resourcemanager
    • None
    • Reviewed

    Description

      https://github.com/apache/hadoop/blob/69fa81679f59378fd19a2c65db8019393d7c05a2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java

        private ThreadPoolExecutor renewerService;
      
        private void processDelegationTokenRenewerEvent(
            DelegationTokenRenewerEvent evt) {
          serviceStateLock.readLock().lock();
          try {
            if (isServiceStarted) {
              renewerService.execute(new DelegationTokenRenewerRunnable(evt));
            } else {
              pendingEventQueue.add(evt);
            }
          } finally {
            serviceStateLock.readLock().unlock();
          }
        }
      
        @Override
        protected void serviceStop() {
          if (renewalTimer != null) {
            renewalTimer.cancel();
          }
          appTokens.clear();
          allTokens.clear();
          this.renewerService.shutdown();
      
      2018-02-21 11:18:16,253  FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
      java.util.concurrent.RejectedExecutionException: Task org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable@39bddaf2 rejected from java.util.concurrent.ThreadPoolExecutor@5f71637b[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 15487]
      	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
      	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
      	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
      	at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.processDelegationTokenRenewerEvent(DelegationTokenRenewer.java:196)
      	at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.applicationFinished(DelegationTokenRenewer.java:734)
      	at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.finishApplication(RMAppManager.java:199)
      	at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:424)
      	at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:65)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:177)
      	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
      	at java.lang.Thread.run(Thread.java:745)
      

      What I think is going on here is that the serviceStop method is not setting the isServiceStarted flag to 'false'.

      Please update so that the serviceStop method grabs the serviceStateLock and sets isServiceStarted to false, before shutting down the renewerService thread pool, to avoid this condition.

      Attachments

        1. YARN-7962.7.patch
          3 kB
          Billie Rinaldi
        2. YARN-7962.6.patch
          4 kB
          Wangda Tan
        3. YARN-7962.4.patch
          4 kB
          David Mollitor
        4. YARN-7962.3.patch
          4 kB
          Billie Rinaldi
        5. YARN-7962.2.patch
          3 kB
          David Mollitor
        6. YARN-7962.1.patch
          2 kB
          David Mollitor

        Issue Links

          Activity

            People

              belugabehr David Mollitor
              belugabehr David Mollitor
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: