Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14563

LoadBalancingKMSClientProvider#warmUpEncryptedKeys swallows IOException

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.8.1
    • Fix Version/s: 2.9.0, 3.0.0-beta1, 2.8.2
    • Component/s: None
    • Labels:
      None

      Description

      TestAclsEndToEnd is failing consistently in HADOOP-14521.
      The reason behind it is LoadBalancingKMSClientProvider#warmUpEncryptedKeys swallows IOException while KMSClientProvider#warmUpEncryptedKeys throws all the way back to createEncryptionZone and creation of EZ fails.
      Following are the relevant piece of code snippets.

      KMSClientProvider.java
        @Override
        public void warmUpEncryptedKeys(String... keyNames)
            throws IOException {
          try {
            encKeyVersionQueue.initializeQueuesForKeys(keyNames);
          } catch (ExecutionException e) {
            throw new IOException(e);
          }
        }
      
      LoadBalancingKMSClientProvider.java
         // This request is sent to all providers in the load-balancing group
        @Override
        public void warmUpEncryptedKeys(String... keyNames) throws IOException {
          for (KMSClientProvider provider : providers) {
            try {
              provider.warmUpEncryptedKeys(keyNames);
            } catch (IOException ioe) {
              LOG.error(
                  "Error warming up keys for provider with url"
                  + "[" + provider.getKMSUrl() + "]", ioe);
            }
          }
        }
      

      In HADOOP-14521, I intend to always instantiate LoadBalancingKMSClientProvider even if there is only one provider so that the retries can applied at only one place.
      We need to decide whether we want to fail in both the case or continue.

      1. HADOOP-14563.patch
        5 kB
        Rushabh S Shah
      2. HADOOP-14563-1.patch
        5 kB
        Rushabh S Shah
      3. HADOOP-14563-2.patch
        5 kB
        Rushabh S Shah

        Activity

        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Sounds like LBKMSCP should throw an exception when all providers throw exception?

        Show
        jojochuang Wei-Chiu Chuang added a comment - Sounds like LBKMSCP should throw an exception when all providers throw exception?
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Sounds like LBKMSCP should throw an exception when all providers throw exception?

        Thanks Wei-Chiu Chuang for the comment.
        I did the same thing as you suggested.
        Attaching the patch.

        Show
        shahrs87 Rushabh S Shah added a comment - Sounds like LBKMSCP should throw an exception when all providers throw exception? Thanks Wei-Chiu Chuang for the comment. I did the same thing as you suggested. Attaching the patch.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 3m 13s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 13m 26s trunk passed
        +1 compile 15m 7s trunk passed
        +1 checkstyle 0m 34s trunk passed
        +1 mvnsite 1m 14s trunk passed
        +1 findbugs 1m 36s trunk passed
        +1 javadoc 0m 54s trunk passed
        +1 mvninstall 0m 49s the patch passed
        +1 compile 12m 30s the patch passed
        +1 javac 12m 30s the patch passed
        -0 checkstyle 0m 32s hadoop-common-project/hadoop-common: The patch generated 20 new + 10 unchanged - 0 fixed = 30 total (was 10)
        +1 mvnsite 1m 1s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 39s the patch passed
        +1 javadoc 0m 46s the patch passed
        +1 unit 7m 52s hadoop-common in the patch passed.
        +1 asflicense 0m 27s The patch does not generate ASF License warnings.
        63m 19s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HADOOP-14563
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12874960/HADOOP-14563.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 1710a3be6784 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 990aa34
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12652/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt
        Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12652/testReport/
        modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12652/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 3m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 13m 26s trunk passed +1 compile 15m 7s trunk passed +1 checkstyle 0m 34s trunk passed +1 mvnsite 1m 14s trunk passed +1 findbugs 1m 36s trunk passed +1 javadoc 0m 54s trunk passed +1 mvninstall 0m 49s the patch passed +1 compile 12m 30s the patch passed +1 javac 12m 30s the patch passed -0 checkstyle 0m 32s hadoop-common-project/hadoop-common: The patch generated 20 new + 10 unchanged - 0 fixed = 30 total (was 10) +1 mvnsite 1m 1s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 39s the patch passed +1 javadoc 0m 46s the patch passed +1 unit 7m 52s hadoop-common in the patch passed. +1 asflicense 0m 27s The patch does not generate ASF License warnings. 63m 19s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14563 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12874960/HADOOP-14563.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 1710a3be6784 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 990aa34 Default Java 1.8.0_131 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12652/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12652/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12652/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Fixed the checkstyle warnings.
        Wei-Chiu Chuang: mind giving quick review ?

        Show
        shahrs87 Rushabh S Shah added a comment - Fixed the checkstyle warnings. Wei-Chiu Chuang : mind giving quick review ?
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Looks mostly good.
        In test testWarmUpEncryptedKeysWhenAllProvidersFail

        fail("Should fail since provider p1 threw IOException");
        

        I think you meant to say "since both providers threw IOException"?

        I am +1 after fixing this and checkstyle warnings.

        Show
        jojochuang Wei-Chiu Chuang added a comment - Looks mostly good. In test testWarmUpEncryptedKeysWhenAllProvidersFail fail( "Should fail since provider p1 threw IOException" ); I think you meant to say "since both providers threw IOException"? I am +1 after fixing this and checkstyle warnings.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 11s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 15m 27s trunk passed
        +1 compile 15m 12s trunk passed
        +1 checkstyle 0m 38s trunk passed
        +1 mvnsite 1m 8s trunk passed
        +1 findbugs 1m 32s trunk passed
        +1 javadoc 0m 49s trunk passed
        +1 mvninstall 0m 42s the patch passed
        +1 compile 11m 3s the patch passed
        +1 javac 11m 3s the patch passed
        +1 checkstyle 0m 40s the patch passed
        +1 mvnsite 1m 14s the patch passed
        -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
        +1 findbugs 1m 52s the patch passed
        +1 javadoc 0m 59s the patch passed
        -1 unit 8m 36s hadoop-common in the patch failed.
        +1 asflicense 0m 40s The patch does not generate ASF License warnings.
        62m 34s



        Reason Tests
        Failed junit tests hadoop.net.TestDNS



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HADOOP-14563
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12875113/HADOOP-14563-1.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 68520557cf3e 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 441378e
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        whitespace https://builds.apache.org/job/PreCommit-HADOOP-Build/12664/artifact/patchprocess/whitespace-eol.txt
        unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12664/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
        Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12664/testReport/
        modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12664/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 11s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 15m 27s trunk passed +1 compile 15m 12s trunk passed +1 checkstyle 0m 38s trunk passed +1 mvnsite 1m 8s trunk passed +1 findbugs 1m 32s trunk passed +1 javadoc 0m 49s trunk passed +1 mvninstall 0m 42s the patch passed +1 compile 11m 3s the patch passed +1 javac 11m 3s the patch passed +1 checkstyle 0m 40s the patch passed +1 mvnsite 1m 14s the patch passed -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply +1 findbugs 1m 52s the patch passed +1 javadoc 0m 59s the patch passed -1 unit 8m 36s hadoop-common in the patch failed. +1 asflicense 0m 40s The patch does not generate ASF License warnings. 62m 34s Reason Tests Failed junit tests hadoop.net.TestDNS Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14563 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12875113/HADOOP-14563-1.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 68520557cf3e 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 441378e Default Java 1.8.0_131 findbugs v3.1.0-RC1 whitespace https://builds.apache.org/job/PreCommit-HADOOP-Build/12664/artifact/patchprocess/whitespace-eol.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12664/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12664/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12664/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Attaching revised patch addressing Wei-Chiu Chuang's comment and fixing white space warning.
        Pasting the diff for quick reference.

        diff ~/patches/jira/HADOOP-14563-1.patch ~/patches/jira/HADOOP-14563-2.patch 
        40c40
        < index 499b991..51d79f9 100644
        ---
        > index 499b991..d14dd59 100644
        79c79
        < +      fail("Should fail since provider p1 threw IOException");
        ---
        > +      fail("Should fail since both providers threw IOException");
        93c93
        < +  public void testWarmUpEncryptedKeysWhenOneProviderSucceeds() 
        ---
        > +  public void testWarmUpEncryptedKeysWhenOneProviderSucceeds()
        

        TestDNS test failure is unrelated to this patch.
        It passes on my local box.

        Please review.

        Show
        shahrs87 Rushabh S Shah added a comment - Attaching revised patch addressing Wei-Chiu Chuang 's comment and fixing white space warning. Pasting the diff for quick reference. diff ~/patches/jira/HADOOP-14563-1.patch ~/patches/jira/HADOOP-14563-2.patch 40c40 < index 499b991..51d79f9 100644 --- > index 499b991..d14dd59 100644 79c79 < + fail("Should fail since provider p1 threw IOException"); --- > + fail("Should fail since both providers threw IOException"); 93c93 < + public void testWarmUpEncryptedKeysWhenOneProviderSucceeds() --- > + public void testWarmUpEncryptedKeysWhenOneProviderSucceeds() TestDNS test failure is unrelated to this patch. It passes on my local box. Please review.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 13s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 13m 46s trunk passed
        +1 compile 14m 53s trunk passed
        +1 checkstyle 0m 33s trunk passed
        +1 mvnsite 1m 7s trunk passed
        +1 findbugs 1m 31s trunk passed
        +1 javadoc 0m 48s trunk passed
        +1 mvninstall 0m 40s the patch passed
        +1 compile 11m 9s the patch passed
        +1 javac 11m 9s the patch passed
        +1 checkstyle 0m 30s the patch passed
        +1 mvnsite 1m 2s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 37s the patch passed
        +1 javadoc 0m 48s the patch passed
        -1 unit 8m 27s hadoop-common in the patch failed.
        +1 asflicense 0m 28s The patch does not generate ASF License warnings.
        59m 9s



        Reason Tests
        Failed junit tests hadoop.ha.TestZKFailoverController



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HADOOP-14563
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12875380/HADOOP-14563-2.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux e46eeed522f7 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / fa1aaee
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12697/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
        Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12697/testReport/
        modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12697/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 13m 46s trunk passed +1 compile 14m 53s trunk passed +1 checkstyle 0m 33s trunk passed +1 mvnsite 1m 7s trunk passed +1 findbugs 1m 31s trunk passed +1 javadoc 0m 48s trunk passed +1 mvninstall 0m 40s the patch passed +1 compile 11m 9s the patch passed +1 javac 11m 9s the patch passed +1 checkstyle 0m 30s the patch passed +1 mvnsite 1m 2s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 37s the patch passed +1 javadoc 0m 48s the patch passed -1 unit 8m 27s hadoop-common in the patch failed. +1 asflicense 0m 28s The patch does not generate ASF License warnings. 59m 9s Reason Tests Failed junit tests hadoop.ha.TestZKFailoverController Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14563 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12875380/HADOOP-14563-2.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e46eeed522f7 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / fa1aaee Default Java 1.8.0_131 findbugs v3.1.0-RC1 unit https://builds.apache.org/job/PreCommit-HADOOP-Build/12697/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12697/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12697/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        shahrs87 Rushabh S Shah added a comment -
        -------------------------------------------------------
         T E S T S
        -------------------------------------------------------
        
        -------------------------------------------------------
         T E S T S
        -------------------------------------------------------
        Running org.apache.hadoop.ha.TestZKFailoverController
        Tests run: 20, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.87 sec - in org.apache.hadoop.ha.TestZKFailoverController
        Results :
        Tests run: 20, Failures: 0, Errors: 0, Skipped: 0
        

        TestZKFailoverController passes locally on my node.
        Wei-Chiu Chuang: Mind giving a quick review.
        Addressed your comment in the last patch.
        Hopefully should be the last pass.
        Thanks for the review.

        Show
        shahrs87 Rushabh S Shah added a comment - ------------------------------------------------------- T E S T S ------------------------------------------------------- ------------------------------------------------------- T E S T S ------------------------------------------------------- Running org.apache.hadoop.ha.TestZKFailoverController Tests run: 20, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.87 sec - in org.apache.hadoop.ha.TestZKFailoverController Results : Tests run: 20, Failures: 0, Errors: 0, Skipped: 0 TestZKFailoverController passes locally on my node. Wei-Chiu Chuang : Mind giving a quick review. Addressed your comment in the last patch. Hopefully should be the last pass. Thanks for the review.
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        +1 Thanks!

        Show
        jojochuang Wei-Chiu Chuang added a comment - +1 Thanks!
        Hide
        jojochuang Wei-Chiu Chuang added a comment -

        Pushed the change to trunk, branch-2 and branch-2.8.
        Thanks for contributing the patch!

        Show
        jojochuang Wei-Chiu Chuang added a comment - Pushed the change to trunk, branch-2 and branch-2.8. Thanks for contributing the patch!
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11973 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11973/)
        HADOOP-14563. LoadBalancingKMSClientProvider#warmUpEncryptedKeys (weichiu: rev 8153fe2bd35fb4df0b64f93ac0046e34d4807ac3)

        • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java
        • (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/kms/TestLoadBalancingKMSClientProvider.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11973 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11973/ ) HADOOP-14563 . LoadBalancingKMSClientProvider#warmUpEncryptedKeys (weichiu: rev 8153fe2bd35fb4df0b64f93ac0046e34d4807ac3) (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/LoadBalancingKMSClientProvider.java (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/crypto/key/kms/TestLoadBalancingKMSClientProvider.java
        Hide
        shahrs87 Rushabh S Shah added a comment -

        Thanks Wei-Chiu Chuang for the review and commit.

        Show
        shahrs87 Rushabh S Shah added a comment - Thanks Wei-Chiu Chuang for the review and commit.
        Hide
        xiaochen Xiao Chen added a comment -

        Thanks Rushabh for the contribution and Wei-Chiu for review, and sorry for coming late on this.

        Sounds like LBKMSCP should throw an exception when all providers throw exception?

        I'm not sure if this is the correct behavior. Say NN is configured with a LBKMSCP with 3 underlying KMSCPs. Now on warmup, 2 of the KMSCPs failed and the 3rd succeeded. When NN is generating the next edek, it has 1/3 chance to fetch one from local cache, and 2/3 chance to make a call to KMS.
        Is this acceptable? I'm leaning on 'fail' instead of 'continue' in Rushabh's original question.

        Arun Suresh, what's your thoughts as the initial author?

        Show
        xiaochen Xiao Chen added a comment - Thanks Rushabh for the contribution and Wei-Chiu for review, and sorry for coming late on this. Sounds like LBKMSCP should throw an exception when all providers throw exception? I'm not sure if this is the correct behavior. Say NN is configured with a LBKMSCP with 3 underlying KMSCPs. Now on warmup, 2 of the KMSCPs failed and the 3rd succeeded. When NN is generating the next edek, it has 1/3 chance to fetch one from local cache, and 2/3 chance to make a call to KMS. Is this acceptable? I'm leaning on 'fail' instead of 'continue' in Rushabh's original question. Arun Suresh , what's your thoughts as the initial author?
        Hide
        shahrs87 Rushabh S Shah added a comment -

        When NN is generating the next edek, it has 1/3 chance to fetch one from local cache, and 2/3 chance to make a call to KMS.
        Is this acceptable?

        Before this fix, even if warmup failed on all the 3 providers, it didn't throw any Exception and the probability of making a synchronous call( on genrateEdek) to KMS was 100%.
        Atleast after the fix, if all the 3 provider's warmUpEncryptedKeys fail, it will fail to create encryption zone.

        As a part of HDFS-12124 or maybe a separate jira, I am thinking of handling the generatingEdek case.
        If one provider return null, it will try all the providers before giving up and throwing RetryStartFile exception to dfs clients.

        Show
        shahrs87 Rushabh S Shah added a comment - When NN is generating the next edek, it has 1/3 chance to fetch one from local cache, and 2/3 chance to make a call to KMS. Is this acceptable? Before this fix, even if warmup failed on all the 3 providers, it didn't throw any Exception and the probability of making a synchronous call( on genrateEdek ) to KMS was 100%. Atleast after the fix, if all the 3 provider's warmUpEncryptedKeys fail, it will fail to create encryption zone. As a part of HDFS-12124 or maybe a separate jira, I am thinking of handling the generatingEdek case. If one provider return null, it will try all the providers before giving up and throwing RetryStartFile exception to dfs clients.
        Hide
        xiaochen Xiao Chen added a comment -

        Yep, agreed the behavior before this is worse.

        I think a successful warmUp should warm up all providers. We can work on HDFS-12124 first, and visit this (or a follow-on) after.

        Show
        xiaochen Xiao Chen added a comment - Yep, agreed the behavior before this is worse. I think a successful warmUp should warm up all providers. We can work on HDFS-12124 first, and visit this (or a follow-on) after.

          People

          • Assignee:
            shahrs87 Rushabh S Shah
            Reporter:
            shahrs87 Rushabh S Shah
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development