Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10305

Lost system-credentials when restarting RM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      System-credentials introduced in YARN-2704, it makes it to keep the long-running apps.
      I’ve met a situation where system-credentials lost when restarting RM.
      Since then, if an app’s AM is stopped, restarting AM will be failed because NMs do not have HDFS delegation token which is needed for resource localization.

      The app has a couple of delegation token including timeline-server token and HDFS delegation token.
      When restarting RM, RM will request a new HDFS delegation token for an app that was submitted long ago. (It's fixed by YARN-5098)
      But, If an app has a couple of delegation token and an exception occur in the token processed first, the next tokens are not processed.
      I think that’s why lost system-credentials.

      Here are RM’s logs at the time of restarting RM.

      2020-05-19 14:25:05,712 WARN  security.DelegationTokenRenewer (DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to add the application to the delegation token renewer on recovery.
      java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: 10.1.1.1:8190, Ident: (TIMELINE_DELEGATION_TOKEN owner=test-admin, renewer=yarn, realUser=yarn, issueDate=1586136363258, maxDate=1587000363258, sequenceNumber=2193, masterKeyId=340)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:503)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.IOException: HTTP status [403], message [org.apache.hadoop.security.token.SecretManager$InvalidToken: yarn tried to renew an expired token (TIMELINE_DELEGATION_TOKEN owner=test-admin, renewer=yarn, realUser=yarn, issueDate=1586136363258, maxDate=1587000363258, sequenceNumber=2193, masterKeyId=340) max expiration date: 2020-04-16 10:26:03,258+0900 currentTime: 2020-05-19 14:25:05,700+0900]
              at org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:166)
              at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:319)
              at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:235)
              at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:437)
              at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:247)
              at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:227)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
              at org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineClientRetryOpForOperateDelegationToken.run(TimelineConnector.java:431)
              at org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineClientConnectionRetry.retryOn(TimelineConnector.java:334)
              at org.apache.hadoop.yarn.client.api.impl.TimelineConnector.operateDelegationToken(TimelineConnector.java:218)
              at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:250)
              at org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81)
              at org.apache.hadoop.security.token.Token.renew(Token.java:512)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:629)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:626)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:625)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:489)
              ... 6 more
      
      

      Attachments

        1. YARN-10305.001.patch
          9 kB
          kyungwan nam

        Issue Links

          Activity

            People

              kyungwan nam kyungwan nam
              kyungwan nam kyungwan nam
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: