Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11210

Enhance key rolling to guarantee new KeyVersion is returned from generateEncryptedKeys after a key is rolled

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.5
    • Fix Version/s: 3.0.0-alpha4
    • Component/s: encryption, kms
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      <!-- markdown -->

      An `invalidateCache` command has been added to the KMS.
      The `rollNewVersion` semantics of the KMS has been improved so that after a key's version is rolled, `generateEncryptedKey` of that key guarantees to return the `EncryptedKeyVersion` based on the new key version.
      Show
      <!-- markdown --> An `invalidateCache` command has been added to the KMS. The `rollNewVersion` semantics of the KMS has been improved so that after a key's version is rolled, `generateEncryptedKey` of that key guarantees to return the `EncryptedKeyVersion` based on the new key version.

      Description

      To support re-encrypting EDEK, we need to make sure after a key is rolled, no old version EDEKs are used anymore. This includes various caches when generating EDEK.
      This is not true currently, simply because no such requirements / necessities before.

      This includes

      • Client Provider(s), and corresponding cache(s).
        When LoadBalancingKMSCP is used, we need to clear all KMSCPs.
      • KMS server instance(s), and corresponding cache(s)
        When KMS HA is configured with multiple KMS instances, only 1 will receive the rollNewVersion request, we need to make sure other instances are rolled too.
      • The Client instance inside NN(s), and corresponding cache(s)
        When hadoop key roll is succeeded, the client provider inside NN should be drained too.
      1. HDFS-11210.01.patch
        21 kB
        Xiao Chen
      2. HDFS-11210.02.patch
        34 kB
        Xiao Chen
      3. HDFS-11210.03.patch
        36 kB
        Xiao Chen
      4. HDFS-11210.04.patch
        37 kB
        Xiao Chen
      5. HDFS-11210.05.patch
        37 kB
        Xiao Chen

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11221 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11221/)
          HDFS-11210. Enhance key rolling to guarantee new KeyVersion is returned (xiao: rev 2007e0cf2ad371e2dbf533c367f09c1f5acd1c0b)

          • (edit) hadoop-common-project/hadoop-kms/src/main/java/org/apache/hadoop/crypto/key/kms/server/KeyAuthorizationKeyProvider.java
          • (edit) hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMSAudit.java
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSRESTConstants.java
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyProvider.java
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/ValueQueue.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyProviderExtension.java
          • (edit) hadoop-common-project/hadoop-kms/src/main/java/org/apache/hadoop/crypto/key/kms/server/EagerKeyGeneratorKeyProviderCryptoExtension.java
          • (edit) hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMS.java
          • (edit) hadoop-common-project/hadoop-kms/src/site/markdown/index.md.vm
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java
          • (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/TestKeyShell.java
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/CachingKeyProvider.java
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyShell.java
          • (edit) hadoop-common-project/hadoop-kms/src/main/java/org/apache/hadoop/crypto/key/kms/server/KMS.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11221 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11221/ ) HDFS-11210 . Enhance key rolling to guarantee new KeyVersion is returned (xiao: rev 2007e0cf2ad371e2dbf533c367f09c1f5acd1c0b) (edit) hadoop-common-project/hadoop-kms/src/main/java/org/apache/hadoop/crypto/key/kms/server/KeyAuthorizationKeyProvider.java (edit) hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMSAudit.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSRESTConstants.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyProvider.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/ValueQueue.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyProviderExtension.java (edit) hadoop-common-project/hadoop-kms/src/main/java/org/apache/hadoop/crypto/key/kms/server/EagerKeyGeneratorKeyProviderCryptoExtension.java (edit) hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMS.java (edit) hadoop-common-project/hadoop-kms/src/site/markdown/index.md.vm (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/TestKeyShell.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/CachingKeyProvider.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/KeyShell.java (edit) hadoop-common-project/hadoop-kms/src/main/java/org/apache/hadoop/crypto/key/kms/server/KMS.java
          Hide
          xiaochen Xiao Chen added a comment -

          Added the one line comment on top of patch 5, and committed to trunk.

          Thanks Andrew for the great comments!

          Show
          xiaochen Xiao Chen added a comment - Added the one line comment on top of patch 5, and committed to trunk. Thanks Andrew for the great comments!
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks a lot Andrew! Test failure look unrelated and passes locally.
          Will be adding the code comment and committing soon.

          Show
          xiaochen Xiao Chen added a comment - Thanks a lot Andrew! Test failure look unrelated and passes locally. Will be adding the code comment and committing soon.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 20s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          0 mvndep 0m 27s Maven dependency ordering for branch
          +1 mvninstall 16m 16s trunk passed
          +1 compile 14m 50s trunk passed
          +1 checkstyle 1m 43s trunk passed
          +1 mvnsite 3m 6s trunk passed
          +1 mvneclipse 0m 58s trunk passed
          +1 findbugs 4m 6s trunk passed
          +1 javadoc 1m 56s trunk passed
          0 mvndep 0m 15s Maven dependency ordering for patch
          +1 mvninstall 1m 53s the patch passed
          +1 compile 12m 18s the patch passed
          +1 javac 12m 18s the patch passed
          +1 checkstyle 1m 44s root: The patch generated 0 new + 221 unchanged - 2 fixed = 221 total (was 223)
          +1 mvnsite 2m 54s the patch passed
          +1 mvneclipse 0m 58s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 4m 39s the patch passed
          +1 javadoc 1m 59s the patch passed
          +1 unit 8m 29s hadoop-common in the patch passed.
          +1 unit 3m 8s hadoop-kms in the patch passed.
          -1 unit 75m 21s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 53s The patch does not generate ASF License warnings.
          159m 41s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-11210
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12851477/HDFS-11210.05.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 672bc5a5a0a1 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / deb368b
          Default Java 1.8.0_121
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18339/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18339/testReport/
          modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms hadoop-hdfs-project/hadoop-hdfs U: .
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18339/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. 0 mvndep 0m 27s Maven dependency ordering for branch +1 mvninstall 16m 16s trunk passed +1 compile 14m 50s trunk passed +1 checkstyle 1m 43s trunk passed +1 mvnsite 3m 6s trunk passed +1 mvneclipse 0m 58s trunk passed +1 findbugs 4m 6s trunk passed +1 javadoc 1m 56s trunk passed 0 mvndep 0m 15s Maven dependency ordering for patch +1 mvninstall 1m 53s the patch passed +1 compile 12m 18s the patch passed +1 javac 12m 18s the patch passed +1 checkstyle 1m 44s root: The patch generated 0 new + 221 unchanged - 2 fixed = 221 total (was 223) +1 mvnsite 2m 54s the patch passed +1 mvneclipse 0m 58s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 4m 39s the patch passed +1 javadoc 1m 59s the patch passed +1 unit 8m 29s hadoop-common in the patch passed. +1 unit 3m 8s hadoop-kms in the patch passed. -1 unit 75m 21s hadoop-hdfs in the patch failed. +1 asflicense 0m 53s The patch does not generate ASF License warnings. 159m 41s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11210 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12851477/HDFS-11210.05.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 672bc5a5a0a1 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / deb368b Default Java 1.8.0_121 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/18339/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18339/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18339/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          andrew.wang Andrew Wang added a comment -

          Gotcha, I figured it was related to negative numbers, though my quick tests with python returned positive numbers. I guess Java's behavior is different; definitely a quirky area for languages.

          Anyway I'm +1 pending Jenkins (and the comment improvement if you want). Great work here!

          Show
          andrew.wang Andrew Wang added a comment - Gotcha, I figured it was related to negative numbers, though my quick tests with python returned positive numbers. I guess Java's behavior is different; definitely a quirky area for languages. Anyway I'm +1 pending Jenkins (and the comment improvement if you want). Great work here!
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Andrew, if you have any other comments, please don't hold back.

          Noticed you changed indexFor from modulo to a mask. What was the reason for this? The masking only works when the array size is a power of 2.

          Because I forgot to change the symbol bit to 0, hence all those test failures when int hashCode return a negative value. Instead of modulo then 0-out the first bit, I figured masking it once is more elegant.
          True it'll be problematic when array size isn't 2, maybe add a comment for future since array size is hardcoded currently?

          And finally we have the item in NN:

          The Client instance inside NN(s), and corresponding cache(s). When hadoop key roll is succeeded, the client provider inside NN should be drained too.

          Still think my 1st comment of invalidateCache when receiving re-encrypt is the way to go, and should probably add that line to the mighty HDFS-10899....

          Show
          xiaochen Xiao Chen added a comment - Thanks Andrew, if you have any other comments, please don't hold back. Noticed you changed indexFor from modulo to a mask. What was the reason for this? The masking only works when the array size is a power of 2. Because I forgot to change the symbol bit to 0, hence all those test failures when int hashCode return a negative value. Instead of modulo then 0-out the first bit, I figured masking it once is more elegant. True it'll be problematic when array size isn't 2, maybe add a comment for future since array size is hardcoded currently? And finally we have the item in NN: The Client instance inside NN(s), and corresponding cache(s). When hadoop key roll is succeeded, the client provider inside NN should be drained too. Still think my 1st comment of invalidateCache when receiving re-encrypt is the way to go, and should probably add that line to the mighty HDFS-10899 ....
          Hide
          andrew.wang Andrew Wang added a comment -

          LGTM overall, I think we can just commit this as is if you want.

          • IMO making the # of locks configurable is optional. The ideal number is typically a multiple of the # of cores, and 16 seems like a very reasonable default considering we also typically don't have that many keys.
          • Noticed you changed indexFor from modulo to a mask. What was the reason for this? The masking only works when the array size is a power of 2.
          Show
          andrew.wang Andrew Wang added a comment - LGTM overall, I think we can just commit this as is if you want. IMO making the # of locks configurable is optional. The ideal number is typically a multiple of the # of cores, and 16 seems like a very reasonable default considering we also typically don't have that many keys. Noticed you changed indexFor from modulo to a mask. What was the reason for this? The masking only works when the array size is a power of 2.
          Hide
          xiaochen Xiao Chen added a comment -

          Patch 5 to implement indexFor better

          Show
          xiaochen Xiao Chen added a comment - Patch 5 to implement indexFor better
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          0 mvndep 0m 18s Maven dependency ordering for branch
          +1 mvninstall 13m 30s trunk passed
          +1 compile 13m 28s trunk passed
          +1 checkstyle 1m 37s trunk passed
          +1 mvnsite 3m 3s trunk passed
          +1 mvneclipse 0m 51s trunk passed
          +1 findbugs 4m 11s trunk passed
          +1 javadoc 1m 56s trunk passed
          0 mvndep 0m 15s Maven dependency ordering for patch
          +1 mvninstall 2m 0s the patch passed
          +1 compile 11m 7s the patch passed
          +1 javac 11m 7s the patch passed
          -0 checkstyle 1m 37s root: The patch generated 2 new + 220 unchanged - 2 fixed = 222 total (was 222)
          +1 mvnsite 2m 47s the patch passed
          +1 mvneclipse 0m 55s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 4m 44s the patch passed
          +1 javadoc 1m 58s the patch passed
          +1 unit 8m 4s hadoop-common in the patch passed.
          +1 unit 3m 12s hadoop-kms in the patch passed.
          -1 unit 95m 38s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 40s The patch does not generate ASF License warnings.
          173m 35s



          Reason Tests
          Failed junit tests hadoop.hdfs.TestTrashWithSecureEncryptionZones
            hadoop.hdfs.TestEncryptionZonesWithKMS
            hadoop.hdfs.server.datanode.checker.TestThrottledAsyncChecker
            hadoop.hdfs.server.datanode.TestDirectoryScanner
            hadoop.hdfs.TestSecureEncryptionZoneWithKMS
          Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-11210
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12851351/HDFS-11210.04.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux a606ad74b911 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 9dbfab1
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18336/artifact/patchprocess/diff-checkstyle-root.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18336/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18336/testReport/
          modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms hadoop-hdfs-project/hadoop-hdfs U: .
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18336/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. 0 mvndep 0m 18s Maven dependency ordering for branch +1 mvninstall 13m 30s trunk passed +1 compile 13m 28s trunk passed +1 checkstyle 1m 37s trunk passed +1 mvnsite 3m 3s trunk passed +1 mvneclipse 0m 51s trunk passed +1 findbugs 4m 11s trunk passed +1 javadoc 1m 56s trunk passed 0 mvndep 0m 15s Maven dependency ordering for patch +1 mvninstall 2m 0s the patch passed +1 compile 11m 7s the patch passed +1 javac 11m 7s the patch passed -0 checkstyle 1m 37s root: The patch generated 2 new + 220 unchanged - 2 fixed = 222 total (was 222) +1 mvnsite 2m 47s the patch passed +1 mvneclipse 0m 55s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 4m 44s the patch passed +1 javadoc 1m 58s the patch passed +1 unit 8m 4s hadoop-common in the patch passed. +1 unit 3m 12s hadoop-kms in the patch passed. -1 unit 95m 38s hadoop-hdfs in the patch failed. +1 asflicense 0m 40s The patch does not generate ASF License warnings. 173m 35s Reason Tests Failed junit tests hadoop.hdfs.TestTrashWithSecureEncryptionZones   hadoop.hdfs.TestEncryptionZonesWithKMS   hadoop.hdfs.server.datanode.checker.TestThrottledAsyncChecker   hadoop.hdfs.server.datanode.TestDirectoryScanner   hadoop.hdfs.TestSecureEncryptionZoneWithKMS Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2 Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11210 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12851351/HDFS-11210.04.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux a606ad74b911 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9dbfab1 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/18336/artifact/patchprocess/diff-checkstyle-root.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18336/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18336/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18336/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks a lot for the review Andrew! Exactly the 'quality review' I mentioned in the last comment.

          locks

          Used a fixed-sized preallocated array for the lock.
          Researched a bit but think we should be able to live with the fixed-sized locks in this case. Resizing an array is easy, but seems a little messy to do for the array of stripped locks without a synchronized block on the ValueQueue object. (IIUC we need to writelock all stripes before switching to a new-sized array, to prevent the same key that's locked at index 1 unprotected due to its new index 2).
          If this looks good to you, I can make the size configurable (and potentially modify ValueQueue to use a builder instead of adding the 7th param to ctor)

          Do you know why UniqueKeyBlockingQueue#put is synchronized, but the other methods aren't?

          Good q, I'm not sure. Looking at the code + comments, the only explanation IMHO is we want to make sure no more than 1 runnable per key is created. And the only thing the async thread do is put, so kinda make sense... We could ping the authors for the answer.

          it looks like we'll invalidate multiple times since there's an invalidateCache call in both KMSCP and LBKMSCP. Do you think we can remove the invalidates in LBKMSCP?

          I don't think it'd be safe to remove it from LBKMSCP, because ValueQueue is per KMSCP. So 1 KMSCP#rollNewVersion call should really invalidates all KMSCPs' caches for a LBKMSCP group.
          We may be able to only invalidate the rest of the KMSCPs in the LBKMSCP group, so each KMSCP only invalidates once. But I don't think this double invalidation will practically create problems, so would like to go for the current KISS way for code cleanness. Thoughts?

          Other comments are all addressed.

          Show
          xiaochen Xiao Chen added a comment - Thanks a lot for the review Andrew! Exactly the 'quality review' I mentioned in the last comment. locks Used a fixed-sized preallocated array for the lock. Researched a bit but think we should be able to live with the fixed-sized locks in this case. Resizing an array is easy, but seems a little messy to do for the array of stripped locks without a synchronized block on the ValueQueue object. (IIUC we need to writelock all stripes before switching to a new-sized array, to prevent the same key that's locked at index 1 unprotected due to its new index 2). If this looks good to you, I can make the size configurable (and potentially modify ValueQueue to use a builder instead of adding the 7th param to ctor) Do you know why UniqueKeyBlockingQueue#put is synchronized, but the other methods aren't? Good q, I'm not sure. Looking at the code + comments, the only explanation IMHO is we want to make sure no more than 1 runnable per key is created. And the only thing the async thread do is put , so kinda make sense... We could ping the authors for the answer. it looks like we'll invalidate multiple times since there's an invalidateCache call in both KMSCP and LBKMSCP. Do you think we can remove the invalidates in LBKMSCP? I don't think it'd be safe to remove it from LBKMSCP, because ValueQueue is per KMSCP. So 1 KMSCP#rollNewVersion call should really invalidates all KMSCPs' caches for a LBKMSCP group. We may be able to only invalidate the rest of the KMSCPs in the LBKMSCP group, so each KMSCP only invalidates once. But I don't think this double invalidation will practically create problems, so would like to go for the current KISS way for code cleanness. Thoughts? Other comments are all addressed.
          Hide
          andrew.wang Andrew Wang added a comment -

          Thanks for the rev Xiao, looks pretty close, mostly nitty stuff and questions:

          • index.md.vm, new entry on getMetadata looks incomplete?
          • TestKMS, typo MockVersinoName
          • Since readLock and writeLock both need to call into a synchronized block to do anything, it might limit performance. I don't know if ValueQueue is actually performance critical, but one approach to solve this is preallocating an array of locks and using striped locking. Essentially, hash the key, and use the lock at that array location. This spreads out load across the locks well assuming the hash function is good. Resizing the array of locks is an exercise left to the reader.
          • Are the createLockIfNonexist calls in readLock and writeLock sufficient for safety, or do we need the other calls too? Would prefer to remove the other calls for clarity if they're not necessary.
          • Do you know why UniqueKeyBlockingQueue#put is synchronized, but the other methods aren't?
          • KMSClientProvider#invalidateCache, little nit, please move the comment up above where we invalidate the server cache
          • Final q, it looks like we'll invalidate multiple times since there's an invalidateCache call in both KMSCP and LBKMSCP. Do you think we can remove the invalidates in LBKMSCP?
          Show
          andrew.wang Andrew Wang added a comment - Thanks for the rev Xiao, looks pretty close, mostly nitty stuff and questions: index.md.vm, new entry on getMetadata looks incomplete? TestKMS, typo MockVersinoName Since readLock and writeLock both need to call into a synchronized block to do anything, it might limit performance. I don't know if ValueQueue is actually performance critical, but one approach to solve this is preallocating an array of locks and using striped locking. Essentially, hash the key, and use the lock at that array location. This spreads out load across the locks well assuming the hash function is good. Resizing the array of locks is an exercise left to the reader. Are the createLockIfNonexist calls in readLock and writeLock sufficient for safety, or do we need the other calls too? Would prefer to remove the other calls for clarity if they're not necessary. Do you know why UniqueKeyBlockingQueue#put is synchronized, but the other methods aren't? KMSClientProvider#invalidateCache, little nit, please move the comment up above where we invalidate the server cache Final q, it looks like we'll invalidate multiple times since there's an invalidateCache call in both KMSCP and LBKMSCP. Do you think we can remove the invalidates in LBKMSCP?
          Hide
          xiaochen Xiao Chen added a comment -

          Patch 3 to fix the 2 related test failure: invalidate cache after rollover in cachingkeyprovider; not getCurrentKey when logging invalidateCache, since that's under another ACL...

          Show
          xiaochen Xiao Chen added a comment - Patch 3 to fix the 2 related test failure: invalidate cache after rollover in cachingkeyprovider; not getCurrentKey when logging invalidateCache, since that's under another ACL...
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          0 mvndep 0m 15s Maven dependency ordering for branch
          +1 mvninstall 13m 1s trunk passed
          +1 compile 13m 24s trunk passed
          +1 checkstyle 1m 36s trunk passed
          +1 mvnsite 2m 58s trunk passed
          +1 mvneclipse 0m 56s trunk passed
          +1 findbugs 3m 46s trunk passed
          +1 javadoc 1m 52s trunk passed
          0 mvndep 0m 16s Maven dependency ordering for patch
          +1 mvninstall 1m 43s the patch passed
          +1 compile 11m 11s the patch passed
          +1 javac 11m 11s the patch passed
          +1 checkstyle 1m 36s root: The patch generated 0 new + 219 unchanged - 2 fixed = 219 total (was 221)
          +1 mvnsite 2m 49s the patch passed
          +1 mvneclipse 0m 57s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 4m 5s the patch passed
          +1 javadoc 1m 50s the patch passed
          -1 unit 8m 23s hadoop-common in the patch failed.
          -1 unit 3m 12s hadoop-kms in the patch failed.
          -1 unit 113m 6s hadoop-hdfs in the patch failed.
          +1 asflicense 1m 2s The patch does not generate ASF License warnings.
          189m 38s



          Reason Tests
          Failed junit tests hadoop.crypto.key.TestCachingKeyProvider
            hadoop.crypto.key.kms.server.TestKMS
            hadoop.hdfs.TestReadStripedFileWithMissingBlocks
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
          Timed out junit tests org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HDFS-11210
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12850334/HDFS-11210.02.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 48cae620810e 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / bec9b7a
          Default Java 1.8.0_121
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18303/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18303/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-kms.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/18303/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18303/testReport/
          modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms hadoop-hdfs-project/hadoop-hdfs U: .
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18303/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. 0 mvndep 0m 15s Maven dependency ordering for branch +1 mvninstall 13m 1s trunk passed +1 compile 13m 24s trunk passed +1 checkstyle 1m 36s trunk passed +1 mvnsite 2m 58s trunk passed +1 mvneclipse 0m 56s trunk passed +1 findbugs 3m 46s trunk passed +1 javadoc 1m 52s trunk passed 0 mvndep 0m 16s Maven dependency ordering for patch +1 mvninstall 1m 43s the patch passed +1 compile 11m 11s the patch passed +1 javac 11m 11s the patch passed +1 checkstyle 1m 36s root: The patch generated 0 new + 219 unchanged - 2 fixed = 219 total (was 221) +1 mvnsite 2m 49s the patch passed +1 mvneclipse 0m 57s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 4m 5s the patch passed +1 javadoc 1m 50s the patch passed -1 unit 8m 23s hadoop-common in the patch failed. -1 unit 3m 12s hadoop-kms in the patch failed. -1 unit 113m 6s hadoop-hdfs in the patch failed. +1 asflicense 1m 2s The patch does not generate ASF License warnings. 189m 38s Reason Tests Failed junit tests hadoop.crypto.key.TestCachingKeyProvider   hadoop.crypto.key.kms.server.TestKMS   hadoop.hdfs.TestReadStripedFileWithMissingBlocks   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.server.namenode.TestNamenodeCapacityReport Timed out junit tests org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11210 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12850334/HDFS-11210.02.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 48cae620810e 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / bec9b7a Default Java 1.8.0_121 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/18303/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18303/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-kms.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/18303/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/18303/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms hadoop-hdfs-project/hadoop-hdfs U: . Console output https://builds.apache.org/job/PreCommit-HDFS-Build/18303/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks for the review Andrew, really appreciate the quality review of yours.

          Good call to invalidate cache, don't know what I was thinking when naming it...

          High-level, is KeyProvider#clearCache intended for use by end-users?

          Since invalidateCache is added to KMS, and the keyprovider API, end-users can definitely use it. I try to hide this detail as much as possible, but agree hadoop key command addition would be handy, and no harm. So added this and a test in patch 2.

          All comments addressed I think. Also added a new test in TestKMS with mock, to fail-before-pass-after the invalidation. The rollover draining test is race by nature, so IMHO the separate mock testing would be clearer.

          Show
          xiaochen Xiao Chen added a comment - Thanks for the review Andrew, really appreciate the quality review of yours. Good call to invalidate cache, don't know what I was thinking when naming it... High-level, is KeyProvider#clearCache intended for use by end-users? Since invalidateCache is added to KMS, and the keyprovider API, end-users can definitely use it. I try to hide this detail as much as possible, but agree hadoop key command addition would be handy, and no harm. So added this and a test in patch 2. All comments addressed I think. Also added a new test in TestKMS with mock, to fail-before-pass-after the invalidation. The rollover draining test is race by nature, so IMHO the separate mock testing would be clearer.
          Hide
          andrew.wang Andrew Wang added a comment -

          Thanks for working on this Xiao, sorry for the slow review, comments and questions:

          High-level, is KeyProvider#clearCache intended for use by end-users? It looks like we call clear after rollover in LBKMSCP automatically, but I could see a "hadoop key" command being useful if there was a failure and you want to retry.

          • I'd prefer "invalidate" to "clear", since "cache invalidation" is a known term. e.g. invalidateCache, INVALIDATE_CACHE, etc.
          • KMS.java, the debug log says "Rolling key with name", should probably say "Invalidating cache for key with name" ?
          • KMSCP, why was the drain in the rollover removed? I see a drain was added in LBKMSCP, but what if we're not using that wrapper?
          • KMSCP, since clearCache isn't implemented users can't call this except via rollover (which calls clearServerCache). Is this intentional? I removed the clearCache in TestKMS and it still passed.
          • KeyProvider, should specify what the param is for clearCache in the Javadoc. Also I think you meant "return the new key version" at the end of the description.
          • ValueQueue, you could have readLock(keyName) readUnlock(keyName) wrappers to make it more concise, and also create the lock automatically.
          • ValueQueue, do we need to worry about races to create locks in keyQueueLocks?
          Show
          andrew.wang Andrew Wang added a comment - Thanks for working on this Xiao, sorry for the slow review, comments and questions: High-level, is KeyProvider#clearCache intended for use by end-users? It looks like we call clear after rollover in LBKMSCP automatically, but I could see a "hadoop key" command being useful if there was a failure and you want to retry. I'd prefer "invalidate" to "clear", since "cache invalidation" is a known term. e.g. invalidateCache, INVALIDATE_CACHE, etc. KMS.java, the debug log says "Rolling key with name", should probably say "Invalidating cache for key with name" ? KMSCP, why was the drain in the rollover removed? I see a drain was added in LBKMSCP, but what if we're not using that wrapper? KMSCP, since clearCache isn't implemented users can't call this except via rollover (which calls clearServerCache). Is this intentional? I removed the clearCache in TestKMS and it still passed. KeyProvider, should specify what the param is for clearCache in the Javadoc. Also I think you meant "return the new key version" at the end of the description. ValueQueue, you could have readLock(keyName) readUnlock(keyName) wrappers to make it more concise, and also create the lock automatically. ValueQueue, do we need to worry about races to create locks in keyQueueLocks?
          Hide
          xiaochen Xiao Chen added a comment -

          Preliminary patch 1 to express the idea. This should take care of all things KMS.

          Regarding the last point about the provider instance (or specifically, KeyProviderCryptoExtension) inside NN:
          since no one else is aware of this client, I'm thinking when a re-encrypt command is submitted, we need to get the keyname, and do a clearCache always. Considering re-encrypt should be an infrequent command executed during maintenance window, this should be acceptable. Better approaches welcome!

          Show
          xiaochen Xiao Chen added a comment - Preliminary patch 1 to express the idea. This should take care of all things KMS. Regarding the last point about the provider instance (or specifically, KeyProviderCryptoExtension ) inside NN: since no one else is aware of this client, I'm thinking when a re-encrypt command is submitted, we need to get the keyname, and do a clearCache always. Considering re-encrypt should be an infrequent command executed during maintenance window, this should be acceptable. Better approaches welcome!

            People

            • Assignee:
              xiaochen Xiao Chen
              Reporter:
              xiaochen Xiao Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development