Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-28291

Add kerberos delegation token renewer feature instead of logged from keytab individually

    XMLWordPrintableJSON

Details

    Description

      1. Design

      LifeCycle of delegation token in RM:

      1. Container starts with DT given by client.
      2. Enable delegation token renewer by:
        1. set security.kerberos.token.renew.enabled true, default false. And
        2. specify security.kerberos.login.keytab and security.kerberos.login.principal
      3. When enabled delegation token renewer, the renewer thread will re-obtain tokens from DelegationTokenProvider(only HadoopFSDelegationTokenProvider now). Then the renewer thread will broadcast new tokens to RM locally, all JMs and all TMs by RPCGateway.
      4. RM process adds new tokens in context by UserGroupInformation.

      LifeCycle of delegation token in JM / TM:

      1. TaskManager starts with keytab stored in remote hdfs.
      2. When registered successfully, JM / TM get the current tokens of RM boxed by JobMasterRegistrationSuccess / TaskExecutorRegistrationSuccess.
      3. JM / TM process add new tokens in context by UserGroupInformation.

      It’s too heavy and unnecessary to retrieval leader of ResourceManager by HAService, so DelegationTokenManager is instanced by ResourceManager. So DelegationToken can hold the reference of ResourceManager, instead of RM RPCGateway or self gateway.

      2. Test

      1. No local junit test. It’s too heavy to build junit environments including KDC and local hadoop.
      1. Cluster test

      step 1: Specify krb5.conf with short token lifetime(ticket_lifetime, renew_lifetime) when submitting flink application.

      ```
      flink run .... -yD security.kerberos.token.renew.enabled=true -yD security.kerberos.krb5-conf.path= /home/work/krb5.conf -yD security.kerberos.login.use-ticket-cache=false ...

      ```
      step 2: Watch token identifier changelog and synchronizer between rm and worker.
      >> 
      In RM / JM log, 
      2022-06-28 15:13:03,509 INFO org.apache.flink.runtime.util.HadoopUtils [] - New token (HDFS_DELEGATION_TOKEN token 52101 for work on ha-hdfs:newfyyy) created in KerberosDelegationToken, and next schedule delay is 64799880 ms.
      2022-06-28 15:13:03,529 INFO org.apache.flink.runtime.util.HadoopUtils [] - Updating delegation tokens for current user. 2022-06-28 15:13:04,729 INFO org.apache.flink.runtime.util.HadoopUtils [] - JobMaster receives new token (HDFS_DELEGATION_TOKEN token 52101 for work on ha-hdfs:newfyyy) from RM.


      2022-06-29 09:13:03,732 INFO org.apache.flink.runtime.util.HadoopUtils [] - New token (HDFS_DELEGATION_TOKEN token 52310 for work on ha-hdfs:newfyyy) created in KerberosDelegationToken, and next schedule delay is 64800045 ms.

      2022-06-29 09:13:03,805 INFO org.apache.flink.runtime.util.HadoopUtils [] - Updating delegation tokens for current user.
      2022-06-29 09:13:03,806 INFO org.apache.flink.runtime.util.HadoopUtils [] - JobMaster receives new token (HDFS_DELEGATION_TOKEN token 52310 for work on ha-hdfs:newfyyy) from RM.

      >> 
      In TM log, 

      2022-06-28 15:13:17,983 INFO org.apache.flink.runtime.util.HadoopUtils [] - TaskManager receives new token (HDFS_DELEGATION_TOKEN token 52101 for work on ha-hdfs:newfyyy) from RM.
      2022-06-28 15:13:18,016 INFO org.apache.flink.runtime.util.HadoopUtils [] - Updating delegation tokens for current user.

      2022-06-29 09:13:03,809 INFO org.apache.flink.runtime.util.HadoopUtils [] - TaskManager receives new token (HDFS_DELEGATION_TOKEN token 52310 for work on ha-hdfs:newfyyy) from RM.

      2022-06-29 09:13:03,836 INFO org.apache.flink.runtime.util.HadoopUtils [] - Updating delegation tokens for current user.

      Attachments

        1. FLINK-28291.0001.patch
          81 kB
          jiulong.zhu

        Activity

          People

            Unassigned Unassigned
            jiulong.zhu1026 jiulong.zhu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: