Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15754

org.apache.spark.deploy.yarn.Client changes the credential of current user

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.6.1
    • 1.6.2, 2.0.0
    • None
    • None
    • Spark Client with Secured Hadoop Cluster

    Description

      Problem

      Spawning of SparkContext in Spark-Client mode changes the credentials of current user group information. This doesn't let the client (who spawned Spark-Context) talk to the Name Node using tgt anymore but, using delegation tokens. This is undesirable for any library to change the context of JVM here UserGroupInformation

      Root Cause

      Spark creates HDFS Delegation Tokens so that the App master so spawned can communicate with Name Node but, during creation of this token Spark adds the delegation token to current users credentials as well.

      org.apache.spark.deploy.yarn.Client.java#createContainerLaunchContext
          setupSecurityToken(amContainer)
          UserGroupInformation.getCurrentUser().addCredentials(credentials)
      
          amContainer

      With this operation client now always uses delegation token for any further communication with Name Node. This scenario becomes dangerous when Resource Manager cancels the Delegation Token after 10 minutes of shutting down the spark context. This leads to issues on client side like:

      org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 444 for subroto) can't be found in cache
      	at org.apache.hadoop.ipc.Client.call(Client.java:1472)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1403)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
      	at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
      	at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
      	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2095)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1214)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1210)
      	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1210)
      	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1409)
      	at Sample.main(Sample.java:85)

      There are other places in code also where we do similar operation like in:
      org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater.updateCredentialsIfRequired()

      Attachments

        Activity

          People

            subrotosanyal Subroto Sanyal
            subrotosanyal Subroto Sanyal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: