Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13487

Hadoop KMS should load old delegation tokens from Zookeeper on startup

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: kms
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Configuration:
      CDH 5.5.1 (Hadoop 2.6+)
      KMS configured to store delegation tokens in Zookeeper
      DEBUG logging enabled in /etc/hadoop-kms/conf/kms-log4j.properties

      Findings:
      It seems to me delegation tokens never get cleaned up from Zookeeper past their renewal date. I can see in the logs that the removal thread is started with the expected interval:

      2016-08-11 08:15:24,511 INFO  AbstractDelegationTokenSecretManager - Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
      

      However, I don't see any delegation token removals, indicated by the following log message:
      org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager --> removeStoredToken(TokenIdent ident), line 769 [CDH]

          if (LOG.isDebugEnabled()) {
            LOG.debug("Removing ZKDTSMDelegationToken_"
                + ident.getSequenceNumber());
          }
      

      Meanwhile, I see a lot of expired delegation tokens in Zookeeper that don't get cleaned up.

      1. HADOOP-13487.01.patch
        8 kB
        Xiao Chen
      2. HADOOP-13487.02.patch
        8 kB
        Xiao Chen
      3. HADOOP-13487.03.patch
        8 kB
        Xiao Chen
      4. HADOOP-13487.04.patch
        8 kB
        Xiao Chen
      5. HADOOP-13487.05.patch
        8 kB
        Xiao Chen

        Issue Links

          Activity

          Hide
          xiaochen Xiao Chen added a comment -

          Thanks for reporting this, Alex Iakovlev.

          Just to clarify, the tokens will only be cleaned up after they reach max lifetime (7 days by default).

          If you're sure the tokens in zookeeper is expired, a workaround would be to remove them manually. But before that, would you mind capture a snapshot of zoookeeper's znodes here, for investigation?

          Show
          xiaochen Xiao Chen added a comment - Thanks for reporting this, Alex Iakovlev . Just to clarify, the tokens will only be cleaned up after they reach max lifetime (7 days by default). If you're sure the tokens in zookeeper is expired, a workaround would be to remove them manually. But before that, would you mind capture a snapshot of zoookeeper's znodes here, for investigation?
          Hide
          Aguinore Alex Iakovlev added a comment -

          I'm pretty sure I haven't report this

          Show
          Aguinore Alex Iakovlev added a comment - I'm pretty sure I haven't report this
          Hide
          xiaochen Xiao Chen added a comment -

          Oops, sorry Alex, I meant Alex Ivanov..

          Show
          xiaochen Xiao Chen added a comment - Oops, sorry Alex, I meant Alex Ivanov ..
          Hide
          axenol Alex Ivanov added a comment -

          Xiao Chen, here are some more details about the issue, and thank you for looking into it!

          ZK dtoken config for KMS (included for completeness):

            <!-- Configuration to store delegation tokens in zookeeper -->
            <property>
              <name>hadoop.kms.authentication.zk-dt-secret-manager.enable</name>
              <value>true</value>
              <description>
                Enables storage of delegation tokens in Zookeeper.
              </description>
            </property>
            <property>
              <name>hadoop.kms.authentication.zk-dt-secret-manager.znodeWorkingPath</name>
              <value>hadoop-kms-dt</value>
              <description>
                The znode path where KMS will store delegation tokens.
              </description>
            </property>
            <property>
              <name>hadoop.kms.authentication.zk-dt-secret-manager.zkConnectionString</name>
              <value>HOST1:2181,HOST2:2181,HOST3:2181</value>
              <description>
                The Zookeeper connection string: a list of hostnames and quorum port comma-separated.
              </description>
            </property>
            <property>
              <name>hadoop.kms.authentication.zk-dt-secret-manager.zkAuthType</name>
              <value>sasl</value>
              <description>
                The Zookeeper authentication type, 'none' or 'sasl' (Kerberos).
              </description>
            </property>
            <property>
              <name>hadoop.kms.authentication.zk-dt-secret-manager.kerberos.keytab</name>
              <value>/etc/hadoop-kms/conf/kms.keytab</value>
              <description>
                The absolute path for the Kerberos keytab with the credentials to
                connect to Zookeeper.
              </description>
            </property>
            <property>
              <name>hadoop.kms.authentication.zk-dt-secret-manager.kerberos.principal</name>
              <value>kms/HOST@BIGDATA</value>
              <description>
                The Kerberos service principal used to connect to Zookeeper.
              </description>
            </property>
          
            <!-- KMS delegation token configuration
                 Extend the lifetime of delegation tokens to support SPAS -->
            <property>
              <name>hadoop.kms.authentication.delegation-token.update-interval.sec</name>
              <value>1209600</value>
              <description>
                How often the master key is rotated, in seconds. Set to 2 weeks.
              </description>
            </property>
            <property>
              <name>hadoop.kms.authentication.delegation-token.max-lifetime.sec</name>
              <value>2419200</value>
              <description>
                Maximum lifetime of a delagation token, in seconds. Set to 4 weeks.
              </description>
            </property>
            <!-- Due to a bug in ZKDelegationTokenSecretManager.java (CDH 5.5.1), this needs to be in millis -->
            <!-- https://issues.apache.org/jira/browse/HADOOP-12659 -->
            <property>
              <name>hadoop.kms.authentication.delegation-token.renew-interval.sec</name>
              <value>1209600000</value>
              <description>
                Renewal interval of a delegation token, in seconds. Set to 2 weeks.
              </description>
            </property>
            <property>
              <name>hadoop.kms.authentication.delegation-token.removal-scan-interval.sec</name>
              <value>3600</value>
              <description>
                Scan interval to remove expired delegation tokens.
              </description>
            </property>
          

          Since I set delegation-token.renew-interval.sec to 2 weeks, I expect the tokens to be invalid after that time (NOTE: I account for HADOOP-12659 specifying the time in millis). There is no process renewing the tokens right now, but even if they were renewed, the maximum lifetime would be 4 weeks based on the setting.
          If I use zkCli to connect to one of the ZK servers, I see there are many delegation tokens (NOTE: I ran all commands today, 08/12/2016):

          [zk: HOST:2181(CONNECTED) 0] stat /hadoop-kms-dt/ZKDTSMRoot/ZKDTSMTokensRoot
          cZxid = 0x1002395a5
          ctime = Mon Jun 13 21:29:02 UTC 2016
          mZxid = 0x1002395a5
          mtime = Mon Jun 13 21:29:02 UTC 2016
          pZxid = 0x100501d21
          cversion = 109499
          dataVersion = 0
          aclVersion = 0
          ephemeralOwner = 0x0
          dataLength = 11
          numChildren = 103229
          

          As you can see, there are over 100k dtokens in that znode. Here's a sample old delegation token from June 29th:

          [zk: HOST:2181(CONNECTED) 2] get /hadoop-kms-dt/ZKDTSMRoot/ZKDTSMTokensRoot/DT_20000
          adminyarnoozie�U��V&�V+�f&�N  U�^&�DJ�=��}ؒ�R����
          cZxid = 0x10029f135
          ctime = Wed Jun 29 09:38:40 UTC 2016
          mZxid = 0x10029f135
          mtime = Wed Jun 29 09:38:40 UTC 2016
          pZxid = 0x10029f135
          cversion = 0
          dataVersion = 0
          aclVersion = 0
          ephemeralOwner = 0x0
          dataLength = 75
          numChildren = 0
          

          Note that the renewal time is NOT available to see in the zkCli console output. I had to write a small program to extract this datum from the znode. Here's the output of the custom program:

          >> ReadDelTokenFromZK 20000
          DT renew date: 1468402720294
          

          1468402720294 = GMT: Wed, 13 Jul 2016 09:38:40.294 GMT
          As you can see, the renewal date corresponds to the interval I've specified, i.e. 2 weeks (June 29th - July 13th).
          The only problem is, it is August 12th today, and the dtoken is still there, which leads me to believe KMS is NOT cleaning up old tokens.

          Show
          axenol Alex Ivanov added a comment - Xiao Chen , here are some more details about the issue, and thank you for looking into it! ZK dtoken config for KMS (included for completeness): <!-- Configuration to store delegation tokens in zookeeper --> <property> <name>hadoop.kms.authentication.zk-dt-secret-manager.enable</name> <value> true </value> <description> Enables storage of delegation tokens in Zookeeper. </description> </property> <property> <name>hadoop.kms.authentication.zk-dt-secret-manager.znodeWorkingPath</name> <value>hadoop-kms-dt</value> <description> The znode path where KMS will store delegation tokens. </description> </property> <property> <name>hadoop.kms.authentication.zk-dt-secret-manager.zkConnectionString</name> <value>HOST1:2181,HOST2:2181,HOST3:2181</value> <description> The Zookeeper connection string: a list of hostnames and quorum port comma-separated. </description> </property> <property> <name>hadoop.kms.authentication.zk-dt-secret-manager.zkAuthType</name> <value>sasl</value> <description> The Zookeeper authentication type, 'none' or 'sasl' (Kerberos). </description> </property> <property> <name>hadoop.kms.authentication.zk-dt-secret-manager.kerberos.keytab</name> <value>/etc/hadoop-kms/conf/kms.keytab</value> <description> The absolute path for the Kerberos keytab with the credentials to connect to Zookeeper. </description> </property> <property> <name>hadoop.kms.authentication.zk-dt-secret-manager.kerberos.principal</name> <value>kms/HOST@BIGDATA</value> <description> The Kerberos service principal used to connect to Zookeeper. </description> </property> <!-- KMS delegation token configuration Extend the lifetime of delegation tokens to support SPAS --> <property> <name>hadoop.kms.authentication.delegation-token.update-interval.sec</name> <value>1209600</value> <description> How often the master key is rotated, in seconds. Set to 2 weeks. </description> </property> <property> <name>hadoop.kms.authentication.delegation-token.max-lifetime.sec</name> <value>2419200</value> <description> Maximum lifetime of a delagation token, in seconds. Set to 4 weeks. </description> </property> <!-- Due to a bug in ZKDelegationTokenSecretManager.java (CDH 5.5.1), this needs to be in millis --> <!-- https: //issues.apache.org/jira/browse/HADOOP-12659 --> <property> <name>hadoop.kms.authentication.delegation-token.renew-interval.sec</name> <value>1209600000</value> <description> Renewal interval of a delegation token, in seconds. Set to 2 weeks. </description> </property> <property> <name>hadoop.kms.authentication.delegation-token.removal-scan-interval.sec</name> <value>3600</value> <description> Scan interval to remove expired delegation tokens. </description> </property> Since I set delegation-token.renew-interval.sec to 2 weeks, I expect the tokens to be invalid after that time (NOTE: I account for HADOOP-12659 specifying the time in millis). There is no process renewing the tokens right now, but even if they were renewed, the maximum lifetime would be 4 weeks based on the setting. If I use zkCli to connect to one of the ZK servers, I see there are many delegation tokens (NOTE: I ran all commands today, 08/12/2016): [zk: HOST:2181(CONNECTED) 0] stat /hadoop-kms-dt/ZKDTSMRoot/ZKDTSMTokensRoot cZxid = 0x1002395a5 ctime = Mon Jun 13 21:29:02 UTC 2016 mZxid = 0x1002395a5 mtime = Mon Jun 13 21:29:02 UTC 2016 pZxid = 0x100501d21 cversion = 109499 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 11 numChildren = 103229 As you can see, there are over 100k dtokens in that znode. Here's a sample old delegation token from June 29th: [zk: HOST:2181(CONNECTED) 2] get /hadoop-kms-dt/ZKDTSMRoot/ZKDTSMTokensRoot/DT_20000 adminyarnoozie�U��V&�V+�f&�N U�^&�DJ�=��}ؒ�R���� cZxid = 0x10029f135 ctime = Wed Jun 29 09:38:40 UTC 2016 mZxid = 0x10029f135 mtime = Wed Jun 29 09:38:40 UTC 2016 pZxid = 0x10029f135 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 75 numChildren = 0 Note that the renewal time is NOT available to see in the zkCli console output. I had to write a small program to extract this datum from the znode. Here's the output of the custom program: >> ReadDelTokenFromZK 20000 DT renew date: 1468402720294 1468402720294 = GMT: Wed, 13 Jul 2016 09:38:40.294 GMT As you can see, the renewal date corresponds to the interval I've specified, i.e. 2 weeks (June 29th - July 13th). The only problem is, it is August 12th today, and the dtoken is still there, which leads me to believe KMS is NOT cleaning up old tokens.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Alex for the details!

          I confirm this is a bug in ZKDelegationTokenSecretManager, and the root cause is that when KMS is restarting, it's not actively loading existing znodes into its cache. Hence if the token is never accessed (e.g. after a cluster-wise restart), the znode is not managed by KMS, and eventually leaked.

          Obviously we can't have any logic depending on a KMS stop, since when zookeeper is used we're supposed to have multiple KMS instances.

          I can think of several options on fixing this:

          1. Always load up existing znodes to cache. This would be straightforward but may harm startup time.
          2. Have another background thread to periodically check znodes and remove expired ones.
          3. Have another process to do #2, so that we don't have to waste resource on multiple KMS instances to do the same clean up work.

          I'm thinking of having a modified #1. Specifically, on KMS restart, fire up a thread to get the znodes, and then iterate through it to remove the expired tokens. We can set a random delay on this background task after startup, to prevent multiple KMS instances racing on the same clean up work.

          Show
          xiaochen Xiao Chen added a comment - Thanks Alex for the details! I confirm this is a bug in ZKDelegationTokenSecretManager , and the root cause is that when KMS is restarting, it's not actively loading existing znodes into its cache. Hence if the token is never accessed (e.g. after a cluster-wise restart), the znode is not managed by KMS, and eventually leaked. Obviously we can't have any logic depending on a KMS stop, since when zookeeper is used we're supposed to have multiple KMS instances. I can think of several options on fixing this: Always load up existing znodes to cache. This would be straightforward but may harm startup time. Have another background thread to periodically check znodes and remove expired ones. Have another process to do #2, so that we don't have to waste resource on multiple KMS instances to do the same clean up work. I'm thinking of having a modified #1. Specifically, on KMS restart, fire up a thread to get the znodes, and then iterate through it to remove the expired tokens. We can set a random delay on this background task after startup, to prevent multiple KMS instances racing on the same clean up work.
          Hide
          xiaochen Xiao Chen added a comment -

          After further looking into this, I think we can simply go with option #1 above.
          On thread startup, PathChildrenCache need to load the znode anyways, which is the most time consuming operation.

          Patch 1 to express the idea, I will test it in a test cluster and update here.
          Benchmarked with 100k existing expired znodes, while kms start up takes minutes, the new node running in memory take about 2 seconds, which I think is fine.

          I intentionally ignored exceptions for compatibility - if the directory contains some znodes that can't be understood by ZKDTSM, KMS should still be able to start and run as normal.

          Show
          xiaochen Xiao Chen added a comment - After further looking into this, I think we can simply go with option #1 above. On thread startup, PathChildrenCache need to load the znode anyways, which is the most time consuming operation. Patch 1 to express the idea, I will test it in a test cluster and update here. Benchmarked with 100k existing expired znodes, while kms start up takes minutes, the new node running in memory take about 2 seconds, which I think is fine. I intentionally ignored exceptions for compatibility - if the directory contains some znodes that can't be understood by ZKDTSM, KMS should still be able to start and run as normal.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 12s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 6m 44s trunk passed
          +1 compile 6m 47s trunk passed
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 54s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 18s trunk passed
          +1 javadoc 0m 44s trunk passed
          +1 mvninstall 0m 38s the patch passed
          +1 compile 6m 42s the patch passed
          -1 javac 6m 42s root generated 1 new + 709 unchanged - 1 fixed = 710 total (was 710)
          +1 checkstyle 0m 23s the patch passed
          +1 mvnsite 0m 52s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 26s the patch passed
          +1 javadoc 0m 45s the patch passed
          +1 unit 7m 46s hadoop-common in the patch passed.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          37m 46s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824083/HADOOP-13487.01.patch
          JIRA Issue HADOOP-13487
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 75a210673a5f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 2353271
          Default Java 1.8.0_101
          findbugs v3.0.0
          javac https://builds.apache.org/job/PreCommit-HADOOP-Build/10275/artifact/patchprocess/diff-compile-javac-root.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10275/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10275/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 44s trunk passed +1 compile 6m 47s trunk passed +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 54s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 18s trunk passed +1 javadoc 0m 44s trunk passed +1 mvninstall 0m 38s the patch passed +1 compile 6m 42s the patch passed -1 javac 6m 42s root generated 1 new + 709 unchanged - 1 fixed = 710 total (was 710) +1 checkstyle 0m 23s the patch passed +1 mvnsite 0m 52s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 26s the patch passed +1 javadoc 0m 45s the patch passed +1 unit 7m 46s hadoop-common in the patch passed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 37m 46s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824083/HADOOP-13487.01.patch JIRA Issue HADOOP-13487 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 75a210673a5f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 2353271 Default Java 1.8.0_101 findbugs v3.0.0 javac https://builds.apache.org/job/PreCommit-HADOOP-Build/10275/artifact/patchprocess/diff-compile-javac-root.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10275/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10275/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Patch 2 to fix the javac warning. I've tested this in a cluster with 100k pre-existing token znodes. Startup took about a minute, and the new code took about 1 second.

          Appreciate any review / comments.

          Show
          xiaochen Xiao Chen added a comment - Patch 2 to fix the javac warning. I've tested this in a cluster with 100k pre-existing token znodes. Startup took about a minute, and the new code took about 1 second. Appreciate any review / comments.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 2s trunk passed
          +1 compile 6m 51s trunk passed
          +1 checkstyle 0m 22s trunk passed
          +1 mvnsite 0m 54s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 19s trunk passed
          +1 javadoc 0m 45s trunk passed
          +1 mvninstall 0m 37s the patch passed
          +1 compile 7m 59s the patch passed
          +1 javac 7m 59s the patch passed
          +1 checkstyle 0m 25s the patch passed
          +1 mvnsite 0m 57s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 47s the patch passed
          +1 javadoc 0m 51s the patch passed
          +1 unit 9m 15s hadoop-common in the patch passed.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          41m 34s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824433/HADOOP-13487.02.patch
          JIRA Issue HADOOP-13487
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 4b62a9047967 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / ae4db25
          Default Java 1.8.0_101
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10302/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10302/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 2s trunk passed +1 compile 6m 51s trunk passed +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 54s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 19s trunk passed +1 javadoc 0m 45s trunk passed +1 mvninstall 0m 37s the patch passed +1 compile 7m 59s the patch passed +1 javac 7m 59s the patch passed +1 checkstyle 0m 25s the patch passed +1 mvnsite 0m 57s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 47s the patch passed +1 javadoc 0m 51s the patch passed +1 unit 9m 15s hadoop-common in the patch passed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 41m 34s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824433/HADOOP-13487.02.patch JIRA Issue HADOOP-13487 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 4b62a9047967 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ae4db25 Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10302/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10302/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          eddyxu Lei (Eddy) Xu added a comment -

          Hi, Xiao Chen

          The patch LGTM overall. But I am not very familiar with this area.

          The only small suggestion I have is that, we can probably create a help function for the cache force loading part for both keyCache and tokenCache.

                          LOG.info("Starting to load key cache.");
          365	        final List<ChildData> children = keyCache.getCurrentData();
          366	        int count = 0;
          367	        for (ChildData child : children) {
          368	          try {
          369	            processKeyAddOrUpdate(child.getData());
          370	          } catch (Exception e) {
          371	            LOG.info("Ignoring node {} because it failed to load.",
          372	                child.getPath());
          373	            LOG.debug("Failure exception:", e);
          374	            ++count;
          375	          }
          376	        }
          377	        if (count > 0) {
          378	          LOG.warn("Ignored {} nodes while loading keyCache.", count);
          379	        }
          380	        LOG.info("Loaded key cache.");
          

          +1 pending the changes.

          Show
          eddyxu Lei (Eddy) Xu added a comment - Hi, Xiao Chen The patch LGTM overall. But I am not very familiar with this area. The only small suggestion I have is that, we can probably create a help function for the cache force loading part for both keyCache and tokenCache. LOG.info( "Starting to load key cache." ); 365 final List<ChildData> children = keyCache.getCurrentData(); 366 int count = 0; 367 for (ChildData child : children) { 368 try { 369 processKeyAddOrUpdate(child.getData()); 370 } catch (Exception e) { 371 LOG.info( "Ignoring node {} because it failed to load." , 372 child.getPath()); 373 LOG.debug( "Failure exception:" , e); 374 ++count; 375 } 376 } 377 if (count > 0) { 378 LOG.warn( "Ignored {} nodes while loading keyCache." , count); 379 } 380 LOG.info( "Loaded key cache." ); +1 pending the changes.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks for looking at this, Lei (Eddy) Xu!

          As explained offline, the call to processTokenAddOrUpdate or processKeyAddOrUpdate are different due to how the decoding is done, so a common method still looks a little messy. I don't feel strongly one way or the other.

          Patch 3 extracts the method, please take a look.

          Show
          xiaochen Xiao Chen added a comment - Thanks for looking at this, Lei (Eddy) Xu ! As explained offline, the call to processTokenAddOrUpdate or processKeyAddOrUpdate are different due to how the decoding is done, so a common method still looks a little messy. I don't feel strongly one way or the other. Patch 3 extracts the method, please take a look.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 20s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 6m 49s trunk passed
          +1 compile 6m 45s trunk passed
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 54s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 18s trunk passed
          +1 javadoc 0m 45s trunk passed
          +1 mvninstall 0m 38s the patch passed
          +1 compile 6m 43s the patch passed
          +1 javac 6m 43s the patch passed
          -0 checkstyle 0m 23s hadoop-common-project/hadoop-common: The patch generated 1 new + 47 unchanged - 0 fixed = 48 total (was 47)
          +1 mvnsite 0m 51s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 26s the patch passed
          +1 javadoc 0m 45s the patch passed
          +1 unit 8m 15s hadoop-common in the patch passed.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          38m 25s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824673/HADOOP-13487.03.patch
          JIRA Issue HADOOP-13487
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux fd93f898d9f6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 2da32a6
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/10319/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10319/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10319/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 49s trunk passed +1 compile 6m 45s trunk passed +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 54s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 18s trunk passed +1 javadoc 0m 45s trunk passed +1 mvninstall 0m 38s the patch passed +1 compile 6m 43s the patch passed +1 javac 6m 43s the patch passed -0 checkstyle 0m 23s hadoop-common-project/hadoop-common: The patch generated 1 new + 47 unchanged - 0 fixed = 48 total (was 47) +1 mvnsite 0m 51s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 26s the patch passed +1 javadoc 0m 45s the patch passed +1 unit 8m 15s hadoop-common in the patch passed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 38m 25s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824673/HADOOP-13487.03.patch JIRA Issue HADOOP-13487 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux fd93f898d9f6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 2da32a6 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/10319/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10319/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10319/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          eddyxu Lei (Eddy) Xu added a comment -

          +1 thanks, Xiao Chen

          Show
          eddyxu Lei (Eddy) Xu added a comment - +1 thanks, Xiao Chen
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Eddy!
          I fixed the typo for checkstyle in patch 4 as we talked offline. Will commit after jenkins come back.

          Show
          xiaochen Xiao Chen added a comment - Thanks Eddy! I fixed the typo for checkstyle in patch 4 as we talked offline. Will commit after jenkins come back.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 23s trunk passed
          +1 compile 7m 54s trunk passed
          +1 checkstyle 0m 25s trunk passed
          +1 mvnsite 0m 59s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 26s trunk passed
          +1 javadoc 0m 47s trunk passed
          +1 mvninstall 0m 38s the patch passed
          +1 compile 7m 38s the patch passed
          +1 javac 7m 38s the patch passed
          +1 checkstyle 0m 25s the patch passed
          +1 mvnsite 0m 53s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 26s the patch passed
          +1 javadoc 0m 45s the patch passed
          -1 unit 7m 41s hadoop-common in the patch failed.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          40m 51s



          Reason Tests
          Failed junit tests hadoop.security.token.delegation.TestZKDelegationTokenSecretManager



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824906/HADOOP-13487.04.patch
          JIRA Issue HADOOP-13487
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 8bd26e8d9d45 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / dc7a1c5
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/10334/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10334/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10334/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 23s trunk passed +1 compile 7m 54s trunk passed +1 checkstyle 0m 25s trunk passed +1 mvnsite 0m 59s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 26s trunk passed +1 javadoc 0m 47s trunk passed +1 mvninstall 0m 38s the patch passed +1 compile 7m 38s the patch passed +1 javac 7m 38s the patch passed +1 checkstyle 0m 25s the patch passed +1 mvnsite 0m 53s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 26s the patch passed +1 javadoc 0m 45s the patch passed -1 unit 7m 41s hadoop-common in the patch failed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 40m 51s Reason Tests Failed junit tests hadoop.security.token.delegation.TestZKDelegationTokenSecretManager Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824906/HADOOP-13487.04.patch JIRA Issue HADOOP-13487 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 8bd26e8d9d45 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / dc7a1c5 Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HADOOP-Build/10334/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10334/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10334/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          axenol Alex Ivanov added a comment -

          Thank you for submitting the patch, Xiao Chen. Can you please clarify why this loading of delegation tokens/keys is necessary? In my experience, the following workflow works, which gave me the impression that tokens from ZK are loaded in cache upon KMS start-up:
          1. Create a KMS delegation token
          2. Restart KMS
          3. Using same delegation token to authenticate still works

          Show
          axenol Alex Ivanov added a comment - Thank you for submitting the patch, Xiao Chen . Can you please clarify why this loading of delegation tokens/keys is necessary? In my experience, the following workflow works, which gave me the impression that tokens from ZK are loaded in cache upon KMS start-up: 1. Create a KMS delegation token 2. Restart KMS 3. Using same delegation token to authenticate still works
          Hide
          xiaochen Xiao Chen added a comment -

          Hi Alex Ivanov,
          Yes, the workflow works, because after restart, although the secret manager doesn't have the token in cache (currentTokens, it will fall back to read from zk. (code).

          The problem is, the token removal thread is only checking the in-memory cache. So if there's an old token in ZK and nobody is using it, it will not be loaded to currentTokens for the removal thread to process.

          Also, since we're already loading PathChildrenCache for tokens and master keys at startup, I think syncing the in-memory cache is the right thing to do.

          Show
          xiaochen Xiao Chen added a comment - Hi Alex Ivanov , Yes, the workflow works, because after restart, although the secret manager doesn't have the token in cache ( currentTokens , it will fall back to read from zk. ( code ). The problem is, the token removal thread is only checking the in-memory cache. So if there's an old token in ZK and nobody is using it, it will not be loaded to currentTokens for the removal thread to process. Also, since we're already loading PathChildrenCache for tokens and master keys at startup, I think syncing the in-memory cache is the right thing to do.
          Hide
          xiaochen Xiao Chen added a comment - - edited

          I enhanced the test to be more robust regarding token cancellation in patch 5. Had 100 runs locally and all passed.

          Show
          xiaochen Xiao Chen added a comment - - edited I enhanced the test to be more robust regarding token cancellation in patch 5. Had 100 runs locally and all passed.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 14s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 0s trunk passed
          +1 compile 6m 49s trunk passed
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 56s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 20s trunk passed
          +1 javadoc 0m 44s trunk passed
          +1 mvninstall 0m 38s the patch passed
          +1 compile 6m 43s the patch passed
          +1 javac 6m 43s the patch passed
          +1 checkstyle 0m 23s the patch passed
          +1 mvnsite 0m 51s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 27s the patch passed
          +1 javadoc 0m 45s the patch passed
          +1 unit 7m 51s hadoop-common in the patch passed.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          38m 14s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824920/HADOOP-13487.05.patch
          JIRA Issue HADOOP-13487
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 8528519634cc 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / dc7a1c5
          Default Java 1.8.0_101
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10336/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10336/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 0s trunk passed +1 compile 6m 49s trunk passed +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 56s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 20s trunk passed +1 javadoc 0m 44s trunk passed +1 mvninstall 0m 38s the patch passed +1 compile 6m 43s the patch passed +1 javac 6m 43s the patch passed +1 checkstyle 0m 23s the patch passed +1 mvnsite 0m 51s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 27s the patch passed +1 javadoc 0m 45s the patch passed +1 unit 7m 51s hadoop-common in the patch passed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 38m 14s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12824920/HADOOP-13487.05.patch JIRA Issue HADOOP-13487 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 8528519634cc 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / dc7a1c5 Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10336/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10336/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          The changes are trivial and test-only after Eddy's +1, so I'm committing this shortly.

          Show
          xiaochen Xiao Chen added a comment - The changes are trivial and test-only after Eddy's +1, so I'm committing this shortly.
          Hide
          xiaochen Xiao Chen added a comment -

          I have committed this to trunk, branch-2 and branch-2.8.

          Thanks Alex Ivanov for reporting this issue, and Lei (Eddy) Xu for reviewing!

          Show
          xiaochen Xiao Chen added a comment - I have committed this to trunk, branch-2 and branch-2.8. Thanks Alex Ivanov for reporting this issue, and Lei (Eddy) Xu for reviewing!
          Hide
          axenol Alex Ivanov added a comment -

          Thank you for clarifying, Xiao Chen, and for submitting a patch so promptly!

          Show
          axenol Alex Ivanov added a comment - Thank you for clarifying, Xiao Chen , and for submitting a patch so promptly!

            People

            • Assignee:
              xiaochen Xiao Chen
              Reporter:
              axenol Alex Ivanov
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development