Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9276

Failed to Update HDFS Delegation Token for long running application in HA mode

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.1
    • Fix Version/s: 2.9.0, 3.0.0-alpha1, 2.8.2
    • Component/s: fs, ha, security
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The Scenario is as follows:
      1. NameNode HA is enabled.
      2. Kerberos is enabled.
      3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with NameNode.
      4. We want to update the HDFS Delegation Token for long running applicatons. HDFS Client will generate private tokens for each NameNode. When we update the HDFS Delegation Token, these private tokens will not be updated, which will cause token expired.

      This bug can be reproduced by the following program:

      import java.security.PrivilegedExceptionAction
      import org.apache.hadoop.conf.Configuration
      import org.apache.hadoop.fs.{FileSystem, Path}
      import org.apache.hadoop.security.UserGroupInformation
      
      object HadoopKerberosTest {
      
        def main(args: Array[String]): Unit = {
          val keytab = "/path/to/keytab/xxx.keytab"
          val principal = "xxx@ABC.COM"
      
          val creds1 = new org.apache.hadoop.security.Credentials()
          val ugi1 = UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
          ugi1.doAs(new PrivilegedExceptionAction[Void] {
            // Get a copy of the credentials
            override def run(): Void = {
              val fs = FileSystem.get(new Configuration())
              fs.addDelegationTokens("test", creds1)
              null
            }
          })
      
          val ugi = UserGroupInformation.createRemoteUser("test")
          ugi.addCredentials(creds1)
          ugi.doAs(new PrivilegedExceptionAction[Void] {
            // Get a copy of the credentials
            override def run(): Void = {
              var i = 0
              while (true) {
                val creds1 = new org.apache.hadoop.security.Credentials()
                val ugi1 = UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
                ugi1.doAs(new PrivilegedExceptionAction[Void] {
                  // Get a copy of the credentials
                  override def run(): Void = {
                    val fs = FileSystem.get(new Configuration())
                    fs.addDelegationTokens("test", creds1)
                    null
                  }
                })
                UserGroupInformation.getCurrentUser.addCredentials(creds1)
      
                val fs = FileSystem.get( new Configuration())
                i += 1
                println()
                println(i)
                println(fs.listFiles(new Path("/user"), false))
                Thread.sleep(60 * 1000)
              }
              null
            }
          })
        }
      }
      

      To reproduce the bug, please set the following configuration to Name Node:

      dfs.namenode.delegation.token.max-lifetime = 10min
      dfs.namenode.delegation.key.update-interval = 3min
      dfs.namenode.delegation.token.renew-interval = 3min
      

      The bug will occure after 3 minutes.

      The stacktrace is:

      Exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
      	at org.apache.hadoop.ipc.Client.call(Client.java:1347)
      	at org.apache.hadoop.ipc.Client.call(Client.java:1300)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
      	at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
      	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
      	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
      	at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
      	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
      	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
      	at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:747)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$15.<init>(DistributedFileSystem.java:726)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:717)
      	at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1780)
      	at org.apache.hadoop.fs.FileSystem$5.<init>(FileSystem.java:1842)
      	at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1839)
      	at HadoopKerberosTest6$$anon$2.run(HadoopKerberosTest6.scala:55)
      	at HadoopKerberosTest6$$anon$2.run(HadoopKerberosTest6.scala:32)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
      	at HadoopKerberosTest6$.main(HadoopKerberosTest6.scala:32)
      	at HadoopKerberosTest6.main(HadoopKerberosTest6.scala)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
      

        Attachments

        1. debug1.PNG
          448 kB
          Liangliang Gu
        2. debug2.PNG
          163 kB
          Liangliang Gu
        3. HDFS-9276.01.patch
          3 kB
          Liangliang Gu
        4. HDFS-9276.02.patch
          3 kB
          Liangliang Gu
        5. HDFS-9276.03.patch
          3 kB
          Liangliang Gu
        6. HDFS-9276.04.patch
          6 kB
          Liangliang Gu
        7. HDFS-9276.05.patch
          5 kB
          Liangliang Gu
        8. HDFS-9276.06.patch
          8 kB
          Liangliang Gu
        9. HDFS-9276.07.patch
          8 kB
          Liangliang Gu
        10. HDFS-9276.08.patch
          7 kB
          Liangliang Gu
        11. HDFS-9276.09.patch
          7 kB
          Liangliang Gu
        12. HDFS-9276.10.patch
          6 kB
          Liangliang Gu
        13. HDFS-9276.11.patch
          6 kB
          Liangliang Gu
        14. HDFS-9276.12.patch
          6 kB
          Liangliang Gu
        15. HDFS-9276.13.patch
          6 kB
          Liangliang Gu
        16. HDFS-9276.14.patch
          7 kB
          John Zhuge
        17. HDFS-9276.15.patch
          6 kB
          John Zhuge
        18. HDFS-9276.16.patch
          6 kB
          John Zhuge
        19. HDFS-9276.17.patch
          6 kB
          John Zhuge
        20. HDFS-9276.18.patch
          6 kB
          John Zhuge
        21. HDFS-9276.19.patch
          7 kB
          John Zhuge
        22. HDFS-9276.20.patch
          7 kB
          John Zhuge
        23. HDFSReadLoop.scala
          0.6 kB
          John Zhuge

          Issue Links

            Activity

              People

              • Assignee:
                marsishandsome Liangliang Gu
                Reporter:
                marsishandsome Liangliang Gu
              • Votes:
                8 Vote for this issue
                Watchers:
                51 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: