Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10305

Lost system-credentials when restarting RM

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      System-credentials introduced in YARN-2704, it makes it to keep the long-running apps.
      I’ve met a situation where system-credentials lost when restarting RM.
      Since then, if an app’s AM is stopped, restarting AM will be failed because NMs do not have HDFS delegation token which is needed for resource localization.

      The app has a couple of delegation token including timeline-server token and HDFS delegation token.
      When restarting RM, RM will request a new HDFS delegation token for an app that was submitted long ago. (It's fixed by YARN-5098)
      But, If an app has a couple of delegation token and an exception occur in the token processed first, the next tokens are not processed.
      I think that’s why lost system-credentials.

      Here are RM’s logs at the time of restarting RM.

      2020-05-19 14:25:05,712 WARN  security.DelegationTokenRenewer (DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to add the application to the delegation token renewer on recovery.
      java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: 10.1.1.1:8190, Ident: (TIMELINE_DELEGATION_TOKEN owner=test-admin, renewer=yarn, realUser=yarn, issueDate=1586136363258, maxDate=1587000363258, sequenceNumber=2193, masterKeyId=340)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:503)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.IOException: HTTP status [403], message [org.apache.hadoop.security.token.SecretManager$InvalidToken: yarn tried to renew an expired token (TIMELINE_DELEGATION_TOKEN owner=test-admin, renewer=yarn, realUser=yarn, issueDate=1586136363258, maxDate=1587000363258, sequenceNumber=2193, masterKeyId=340) max expiration date: 2020-04-16 10:26:03,258+0900 currentTime: 2020-05-19 14:25:05,700+0900]
              at org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:166)
              at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:319)
              at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:235)
              at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:437)
              at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:247)
              at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:227)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
              at org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineClientRetryOpForOperateDelegationToken.run(TimelineConnector.java:431)
              at org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineClientConnectionRetry.retryOn(TimelineConnector.java:334)
              at org.apache.hadoop.yarn.client.api.impl.TimelineConnector.operateDelegationToken(TimelineConnector.java:218)
              at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:250)
              at org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81)
              at org.apache.hadoop.security.token.Token.renew(Token.java:512)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:629)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:626)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:422)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:625)
              at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:489)
              ... 6 more
      
      

      Attachments

        1. YARN-10305.001.patch
          9 kB
          kyungwan nam

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kyungwan nam kyungwan nam
            kyungwan nam kyungwan nam

            Dates

              Created:
              Updated:

              Slack

                Issue deployment