Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11702

Remove indefinite caching of key provider uri in DFSClient

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha4, 2.8.2
    • Component/s: hdfs-client
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      There is an indefinite caching of key provider uri in dfsclient.
      Relevant piece of code.

      DFSClient.java
        /**
         * The key provider uri is searched in the following order.
         * 1. If there is a mapping in Credential's secrets map for namenode uri.
         * 2. From namenode getServerDefaults rpc.
         * 3. Finally fallback to local conf.
         * @return keyProviderUri if found from either of above 3 cases,
         * null otherwise
         * @throws IOException
         */
        URI getKeyProviderUri() throws IOException {
          if (keyProviderUri != null) {
            return keyProviderUri;
          }
          // Lookup the secret in credentials object for namenodeuri.
          Credentials credentials = ugi.getCredentials();
         ...
         ...
      

      Once the key provider uri is set, it won't refresh the value even if the key provider uri on namenode is changed.
      For long running clients like on oozie servers, this means we have to bounce all the oozie servers to get the change reflected.
      After this change, the client will cache the value for an hour after which it will issue getServerDefaults call and will refresh the key provider uri.

      1. HDFS-11702.patch
        5 kB
        Rushabh S Shah

        Issue Links

          Activity

          Hide
          shahrs87 Rushabh S Shah added a comment -

          Attaching a simple patch.

          Show
          shahrs87 Rushabh S Shah added a comment - Attaching a simple patch.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 33s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 32s Maven dependency ordering for branch
          +1 mvninstall 13m 2s trunk passed
          +1 compile 1m 22s trunk passed
          +1 checkstyle 0m 40s trunk passed
          +1 mvnsite 1m 26s trunk passed
          +1 mvneclipse 0m 24s trunk passed
          -1 findbugs 1m 26s hadoop-hdfs-project/hadoop-hdfs-client in trunk has 2 extant Findbugs warnings.
          -1 findbugs 1m 43s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings.
          +1 javadoc 1m 2s trunk passed
          0 mvndep 0m 6s Maven dependency ordering for patch
          +1 mvninstall 1m 17s the patch passed
          +1 compile 1m 19s the patch passed
          +1 javac 1m 19s the patch passed
          +1 checkstyle 0m 37s the patch passed
          +1 mvnsite 1m 17s the patch passed
          +1 mvneclipse 0m 24s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 3m 5s the patch passed
          +1 javadoc 0m 55s the patch passed
          +1 unit 1m 12s hadoop-hdfs-client in the patch passed.
          -1 unit 64m 7s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          98m 23s



          Reason Tests
          Failed junit tests hadoop.hdfs.web.TestWebHdfsTimeouts
          Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ac17dc
          JIRA Issue HDFS-11702
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865017/HDFS-11702.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 1155131d187c 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 475f933
          Default Java 1.8.0_121
          findbugs v3.1.0-RC1
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19200/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19200/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/19200/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19200/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19200/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 33s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 32s Maven dependency ordering for branch +1 mvninstall 13m 2s trunk passed +1 compile 1m 22s trunk passed +1 checkstyle 0m 40s trunk passed +1 mvnsite 1m 26s trunk passed +1 mvneclipse 0m 24s trunk passed -1 findbugs 1m 26s hadoop-hdfs-project/hadoop-hdfs-client in trunk has 2 extant Findbugs warnings. -1 findbugs 1m 43s hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. +1 javadoc 1m 2s trunk passed 0 mvndep 0m 6s Maven dependency ordering for patch +1 mvninstall 1m 17s the patch passed +1 compile 1m 19s the patch passed +1 javac 1m 19s the patch passed +1 checkstyle 0m 37s the patch passed +1 mvnsite 1m 17s the patch passed +1 mvneclipse 0m 24s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 3m 5s the patch passed +1 javadoc 0m 55s the patch passed +1 unit 1m 12s hadoop-hdfs-client in the patch passed. -1 unit 64m 7s hadoop-hdfs in the patch failed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 98m 23s Reason Tests Failed junit tests hadoop.hdfs.web.TestWebHdfsTimeouts Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2 Subsystem Report/Notes Docker Image:yetus/hadoop:0ac17dc JIRA Issue HDFS-11702 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12865017/HDFS-11702.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 1155131d187c 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 475f933 Default Java 1.8.0_121 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19200/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/19200/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/19200/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/19200/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project Console output https://builds.apache.org/job/PreCommit-HDFS-Build/19200/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hanishakoneru Hanisha Koneru added a comment -

          Rushabh S Shah, the patch doesn't apply cleanly for me.

          • Deleting DFSClient#keyProviderUri is breaking the code. It is accessed in many places besides DFSClient#getKeyProviderUri
          • After this change, the client will cache the value for an hour after which it will issue getServerDefaults call and will refresh the key provider uri.

            Every time DFSClient#getKeyProviderUri is called, keyProviderUri is set to null and recalulated. There is no caching being done.

          Show
          hanishakoneru Hanisha Koneru added a comment - Rushabh S Shah , the patch doesn't apply cleanly for me. Deleting DFSClient#keyProviderUri is breaking the code. It is accessed in many places besides DFSClient#getKeyProviderUri After this change, the client will cache the value for an hour after which it will issue getServerDefaults call and will refresh the key provider uri. Every time DFSClient#getKeyProviderUri is called, keyProviderUri is set to null and recalulated. There is no caching being done.
          Hide
          shahrs87 Rushabh S Shah added a comment -

          Thanks Hanisha Koneru for looking into this jira.

          the patch doesn't apply cleanly for me.

          Which branch are you applying on ?
          This patch is for trunk. I haven't tested on 2.8 or 2.7.
          Given that jenkins build didn't complain about applying/compiling, that makes me think maybe something wrong in your environment ?

          Every time DFSClient#getKeyProviderUri is called, keyProviderUri is set to null and recalulated. There is no caching being done.

          If you notice, then there are mainly 2 places to get keyprovider uri other than local conf.
          1. byte[] keyProviderUriBytes = credentials.getSecretKey(getKeyProviderMapKey());
          First if dfs client is in a task and if EZ was enabled during job submission then the keyprovider from which kms token was fetched is added to credentials object.

          2. FsServerDefaults serverDefaults = getServerDefaults();
          If you notice the getServerDefaults carefully, then the caching is done there. The cache is good for an hour after which it will again call Namenode#getServerDefaults.
          Relevant piece of code

          DFSClient.java
            public FsServerDefaults getServerDefaults() throws IOException {
              checkOpen();
              long now = Time.monotonicNow();
              if ((serverDefaults == null) ||
                  (now - serverDefaultsLastUpdate > SERVER_DEFAULTS_VALIDITY_PERIOD)) {
                serverDefaults = namenode.getServerDefaults();
                serverDefaultsLastUpdate = now;
              }
              assert serverDefaults != null;
              return serverDefaults;
            }
          

          Let me know if you I missed something.

          Show
          shahrs87 Rushabh S Shah added a comment - Thanks Hanisha Koneru for looking into this jira. the patch doesn't apply cleanly for me. Which branch are you applying on ? This patch is for trunk. I haven't tested on 2.8 or 2.7. Given that jenkins build didn't complain about applying/compiling, that makes me think maybe something wrong in your environment ? Every time DFSClient#getKeyProviderUri is called, keyProviderUri is set to null and recalulated. There is no caching being done. If you notice, then there are mainly 2 places to get keyprovider uri other than local conf. 1. byte[] keyProviderUriBytes = credentials.getSecretKey(getKeyProviderMapKey()); First if dfs client is in a task and if EZ was enabled during job submission then the keyprovider from which kms token was fetched is added to credentials object. 2. FsServerDefaults serverDefaults = getServerDefaults(); If you notice the getServerDefaults carefully, then the caching is done there. The cache is good for an hour after which it will again call Namenode#getServerDefaults. Relevant piece of code DFSClient.java public FsServerDefaults getServerDefaults() throws IOException { checkOpen(); long now = Time.monotonicNow(); if ((serverDefaults == null ) || (now - serverDefaultsLastUpdate > SERVER_DEFAULTS_VALIDITY_PERIOD)) { serverDefaults = namenode.getServerDefaults(); serverDefaultsLastUpdate = now; } assert serverDefaults != null ; return serverDefaults; } Let me know if you I missed something.
          Hide
          daryn Daryn Sharp added a comment -

          +1 Assuming no other objections. W/o this patch when we changed the key provider for a cluster, we also had to restart multiple other services including but not limited to oozie, hive, etc.

          On the original jira some concerns were expressed about the lockless rpc for server defaults (I wouldn't care if it was 1min...), so as a middle ground, I'd suggest making the server defaults refresh interval configurable. It could be incorporated in this patch or a new one. Up to you.

          Show
          daryn Daryn Sharp added a comment - +1 Assuming no other objections. W/o this patch when we changed the key provider for a cluster, we also had to restart multiple other services including but not limited to oozie, hive, etc. On the original jira some concerns were expressed about the lockless rpc for server defaults (I wouldn't care if it was 1min...), so as a middle ground, I'd suggest making the server defaults refresh interval configurable. It could be incorporated in this patch or a new one. Up to you.
          Hide
          shahrs87 Rushabh S Shah added a comment -

          Thanks Daryn Sharp for review.

          I'd suggest making the server defaults refresh interval configurable. It could be incorporated in this patch or a new one

          Created HDFS-11754.

          Show
          shahrs87 Rushabh S Shah added a comment - Thanks Daryn Sharp for review. I'd suggest making the server defaults refresh interval configurable. It could be incorporated in this patch or a new one Created HDFS-11754 .
          Hide
          hanishakoneru Hanisha Koneru added a comment -

          Thanks for clarifying, Rushabh S Shah.
          LGTM.

          Show
          hanishakoneru Hanisha Koneru added a comment - Thanks for clarifying, Rushabh S Shah . LGTM.
          Hide
          andrew.wang Andrew Wang added a comment -

          LGTM, though is there a unit test for this new behavior?

          Show
          andrew.wang Andrew Wang added a comment - LGTM, though is there a unit test for this new behavior?
          Hide
          shahrs87 Rushabh S Shah added a comment - - edited

          Thanks Hanisha Koneru Andrew Wang for the reviews.

          though is there a unit test for this new behavior?

          The test case from HADOOP-14104 are still valid tests.
          I was explicitly calling setKeyProvider(null) from test cases to make sure it doesn't cache indefinitely.

          Show
          shahrs87 Rushabh S Shah added a comment - - edited Thanks Hanisha Koneru Andrew Wang for the reviews. though is there a unit test for this new behavior? The test case from HADOOP-14104 are still valid tests. I was explicitly calling setKeyProvider(null) from test cases to make sure it doesn't cache indefinitely.
          Hide
          kihwal Kihwal Lee added a comment -

          Committed this to trunk, branch-2 and branch-2.8.

          Show
          kihwal Kihwal Lee added a comment - Committed this to trunk, branch-2 and branch-2.8.
          Hide
          shahrs87 Rushabh S Shah added a comment -

          Thanks all for reviews and Kihwal Lee for committing the patch !

          Show
          shahrs87 Rushabh S Shah added a comment - Thanks all for reviews and Kihwal Lee for committing the patch !
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11696 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11696/)
          HDFS-11702. Remove indefinite caching of key provider uri in DFSClient. (kihwal: rev cef2815cf48154fe82f44082dcbdce6373c81284)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11696 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11696/ ) HDFS-11702 . Remove indefinite caching of key provider uri in DFSClient. (kihwal: rev cef2815cf48154fe82f44082dcbdce6373c81284) (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java

            People

            • Assignee:
              shahrs87 Rushabh S Shah
              Reporter:
              shahrs87 Rushabh S Shah
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development