Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-11384

NPE in DelegationTokenRenewer causes all subsequent apps to fail with "Timer already cancelled"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • yarn
    • None

    Description

      All newly submitted yarn apps start failing with following error in the diagnostic message

      org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: Unable to add the application to the delegation token renewer.
      java.lang.IllegalStateException: Timer already cancelled.
              at java.util.Timer.sched(Timer.java:397)
              at java.util.Timer.schedule(Timer.java:208)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.setTimerForTokenRenewal(DelegationTokenRenewer.java:604)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:515)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$900(DelegationTokenRenewer.java:79)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:923)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:900)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      

       Any uncaught exception at DelegationTokenRenewer.RenewalTimerTask#run causes all subsequent yarn apps to fail with java.lang.IllegalStateException: Timer already cancelled

      One such NPE is thrown when DelegationTokenRenewer.RenewalTimerTask#run invokes DelegationTokenRenewer#removeFailedDelegationToken that tries to remove token from the "DelegationTokenRenewer#appTokens" for an applicationId.

      If DelegationTokenRenewer#appTokens map didn’t have the <ApplicationId, Set<DelegationTokenToRenew>> entry while token (DelegationTokenToRenew) had the reference to the applicationId, then NPE is thrown leading to "Timer already cancelled"

       

      Attachments

        Issue Links

          Activity

            People

              adsharma Aditya Sharma
              adsharma Aditya Sharma
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: