Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13381

KMS clients should use KMS Delegation Tokens from current UGI.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: kms
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When /tmp is setup as an EZ, one may experience YARN log aggregation failure after the very first KMS token is expired. The MR job itself runs fine though.

      When this happens, YARN NodeManager's log will show AuthenticationException with token is expired / token can't be found in cache, depending on whether the expired token is removed by the background or not.

      1. HADOOP-13381.01.patch
        8 kB
        Xiao Chen
      2. HADOOP-13381.02.patch
        10 kB
        Xiao Chen
      3. HADOOP-13381.03.patch
        10 kB
        Xiao Chen
      4. HADOOP-13381.04.patch
        9 kB
        Xiao Chen

        Issue Links

          Activity

          Hide
          xyao Xiaoyu Yao added a comment -

          Thanks Xiao Chen for the quick response.

          Show
          xyao Xiaoyu Yao added a comment - Thanks Xiao Chen for the quick response.
          Hide
          xiaochen Xiao Chen added a comment -

          Sure Xiaoyu Yao, just pushed this to branch-2.8.

          Show
          xiaochen Xiao Chen added a comment - Sure Xiaoyu Yao , just pushed this to branch-2.8.
          Hide
          xyao Xiaoyu Yao added a comment - - edited

          This one is labelled with target version Hadoop-2.8 but somehow missed from branch-2.8.
          Propose to backport this to branch-2.8, the cherrypick from branch-2 commit is clean and TestKMS passed successfully. What do you think, Xiao Chen.

          Show
          xyao Xiaoyu Yao added a comment - - edited This one is labelled with target version Hadoop-2.8 but somehow missed from branch-2.8. Propose to backport this to branch-2.8, the cherrypick from branch-2 commit is clean and TestKMS passed successfully. What do you think, Xiao Chen .
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #10176 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10176/)
          HADOOP-13381. KMS clients should use KMS Delegation Tokens from current (xiao: rev 8ebf2e95d2053cb94c6ff87ca018811fe8276f2b)

          • hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMS.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #10176 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10176/ ) HADOOP-13381 . KMS clients should use KMS Delegation Tokens from current (xiao: rev 8ebf2e95d2053cb94c6ff87ca018811fe8276f2b) hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMS.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java
          Hide
          xiaochen Xiao Chen added a comment -

          Committed to trunk and branch-2. There is a minor conflict (imports) when backporting to branch-2, compiled and made sure TestKMS pass before pushing.

          Thanks a lot Arun Suresh for the thoughtful reviews, and Andrew Wang for chiming in!

          Show
          xiaochen Xiao Chen added a comment - Committed to trunk and branch-2. There is a minor conflict (imports) when backporting to branch-2, compiled and made sure TestKMS pass before pushing. Thanks a lot Arun Suresh for the thoughtful reviews, and Andrew Wang for chiming in!
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Arun, committing this.

          Show
          xiaochen Xiao Chen added a comment - Thanks Arun, committing this.
          Hide
          asuresh Arun Suresh added a comment -

          +1, Thanks for taking care of this Xiao Chen

          Show
          asuresh Arun Suresh added a comment - +1, Thanks for taking care of this Xiao Chen
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 7s Maven dependency ordering for branch
          +1 mvninstall 7m 55s trunk passed
          +1 compile 8m 22s trunk passed
          +1 checkstyle 0m 27s trunk passed
          +1 mvnsite 1m 17s trunk passed
          +1 mvneclipse 0m 23s trunk passed
          +1 findbugs 1m 40s trunk passed
          +1 javadoc 0m 57s trunk passed
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 0m 53s the patch passed
          +1 compile 6m 46s the patch passed
          +1 javac 6m 46s the patch passed
          +1 checkstyle 0m 28s the patch passed
          +1 mvnsite 1m 10s the patch passed
          +1 mvneclipse 0m 26s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 56s the patch passed
          +1 javadoc 0m 57s the patch passed
          +1 unit 7m 1s hadoop-common in the patch passed.
          +1 unit 2m 12s hadoop-kms in the patch passed.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          47m 50s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12820780/HADOOP-13381.04.patch
          JIRA Issue HADOOP-13381
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 0389f1bd4d9b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 26de4f0
          Default Java 1.8.0_101
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10110/testReport/
          modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms U: hadoop-common-project
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10110/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 7s Maven dependency ordering for branch +1 mvninstall 7m 55s trunk passed +1 compile 8m 22s trunk passed +1 checkstyle 0m 27s trunk passed +1 mvnsite 1m 17s trunk passed +1 mvneclipse 0m 23s trunk passed +1 findbugs 1m 40s trunk passed +1 javadoc 0m 57s trunk passed 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 0m 53s the patch passed +1 compile 6m 46s the patch passed +1 javac 6m 46s the patch passed +1 checkstyle 0m 28s the patch passed +1 mvnsite 1m 10s the patch passed +1 mvneclipse 0m 26s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 56s the patch passed +1 javadoc 0m 57s the patch passed +1 unit 7m 1s hadoop-common in the patch passed. +1 unit 2m 12s hadoop-kms in the patch passed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 47m 50s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12820780/HADOOP-13381.04.patch JIRA Issue HADOOP-13381 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 0389f1bd4d9b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 26de4f0 Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10110/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms U: hadoop-common-project Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10110/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Thank you for the nice suggestion, Arun Suresh.
          Patch 4 attached to address it.

          Show
          xiaochen Xiao Chen added a comment - Thank you for the nice suggestion, Arun Suresh . Patch 4 attached to address it.
          Hide
          asuresh Arun Suresh added a comment -

          I agree.. I also prefer what you've done in v03.

          Minor nit:
          You can maybe replace the huge if else with:

          UserGroupInformation ugiToUse = (currentUgiContainsKmsDt() && doAsUser == null) ? currentUgi : actualUgi;
          conn = ugiToUse.doAs(new PrivilegedExceptionAction<HttpURLConnection>() {
            @Override
            public HttpURLConnection run() throws Exception {
              DelegationTokenAuthenticatedURL authUrl =
                  new DelegationTokenAuthenticatedURL(configurator);
              return authUrl.openConnection(url, authToken, doAsUser);
            }
          });
          
          Show
          asuresh Arun Suresh added a comment - I agree.. I also prefer what you've done in v03. Minor nit: You can maybe replace the huge if else with: UserGroupInformation ugiToUse = (currentUgiContainsKmsDt() && doAsUser == null) ? currentUgi : actualUgi; conn = ugiToUse.doAs(new PrivilegedExceptionAction<HttpURLConnection>() { @Override public HttpURLConnection run() throws Exception { DelegationTokenAuthenticatedURL authUrl = new DelegationTokenAuthenticatedURL(configurator); return authUrl.openConnection(url, authToken, doAsUser); } });
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 22s Maven dependency ordering for branch
          +1 mvninstall 6m 55s trunk passed
          +1 compile 6m 49s trunk passed
          +1 checkstyle 0m 28s trunk passed
          +1 mvnsite 1m 14s trunk passed
          +1 mvneclipse 0m 25s trunk passed
          +1 findbugs 1m 41s trunk passed
          +1 javadoc 1m 0s trunk passed
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 0m 54s the patch passed
          +1 compile 6m 47s the patch passed
          +1 javac 6m 47s the patch passed
          +1 checkstyle 0m 27s the patch passed
          +1 mvnsite 1m 10s the patch passed
          +1 mvneclipse 0m 25s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 56s the patch passed
          +1 javadoc 0m 58s the patch passed
          +1 unit 8m 9s hadoop-common in the patch passed.
          +1 unit 2m 17s hadoop-kms in the patch passed.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          46m 46s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12820067/HADOOP-13381.03.patch
          JIRA Issue HADOOP-13381
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 5427676dfd36 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / d383bfd
          Default Java 1.8.0_101
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10081/testReport/
          modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms U: hadoop-common-project
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10081/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 22s Maven dependency ordering for branch +1 mvninstall 6m 55s trunk passed +1 compile 6m 49s trunk passed +1 checkstyle 0m 28s trunk passed +1 mvnsite 1m 14s trunk passed +1 mvneclipse 0m 25s trunk passed +1 findbugs 1m 41s trunk passed +1 javadoc 1m 0s trunk passed 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 0m 54s the patch passed +1 compile 6m 47s the patch passed +1 javac 6m 47s the patch passed +1 checkstyle 0m 27s the patch passed +1 mvnsite 1m 10s the patch passed +1 mvneclipse 0m 25s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 56s the patch passed +1 javadoc 0m 58s the patch passed +1 unit 8m 9s hadoop-common in the patch passed. +1 unit 2m 17s hadoop-kms in the patch passed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 46m 46s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12820067/HADOOP-13381.03.patch JIRA Issue HADOOP-13381 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 5427676dfd36 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d383bfd Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10081/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms U: hadoop-common-project Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10081/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Patch 3 to make findbugs happy.

          Arun Suresh,
          Could you please take a look at this when you have a chance? I feel this is the better way to fix the issue: if there's a DT present, then underlying user doesn't matter, and we doAs the DT's UGI to use it. Otherwise, we keep existing behavior.
          Thanks in advance.

          Show
          xiaochen Xiao Chen added a comment - Patch 3 to make findbugs happy. Arun Suresh , Could you please take a look at this when you have a chance? I feel this is the better way to fix the issue: if there's a DT present, then underlying user doesn't matter, and we doAs the DT's UGI to use it. Otherwise, we keep existing behavior. Thanks in advance.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 19s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 7s Maven dependency ordering for branch
          +1 mvninstall 6m 37s trunk passed
          +1 compile 6m 50s trunk passed
          +1 checkstyle 0m 28s trunk passed
          +1 mvnsite 1m 13s trunk passed
          +1 mvneclipse 0m 25s trunk passed
          +1 findbugs 1m 42s trunk passed
          +1 javadoc 0m 59s trunk passed
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 0m 55s the patch passed
          +1 compile 6m 47s the patch passed
          +1 javac 6m 47s the patch passed
          +1 checkstyle 0m 28s the patch passed
          +1 mvnsite 1m 16s the patch passed
          +1 mvneclipse 0m 26s the patch passed
          +1 whitespace 0m 1s The patch has no whitespace issues.
          -1 findbugs 1m 35s hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          +1 javadoc 1m 0s the patch passed
          +1 unit 7m 41s hadoop-common in the patch passed.
          +1 unit 2m 17s hadoop-kms in the patch passed.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          46m 9s



          Reason Tests
          FindBugs module:hadoop-common-project/hadoop-common
            Load of known null value in org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(URL, String) At KMSClientProvider.java:in org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(URL, String) At KMSClientProvider.java:[line 542]



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12819148/HADOOP-13381.02.patch
          JIRA Issue HADOOP-13381
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux dccc7992d7f2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 38128ba
          Default Java 1.8.0_91
          findbugs v3.0.0
          findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/10039/artifact/patchprocess/new-findbugs-hadoop-common-project_hadoop-common.html
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10039/testReport/
          modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms U: hadoop-common-project
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10039/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 7s Maven dependency ordering for branch +1 mvninstall 6m 37s trunk passed +1 compile 6m 50s trunk passed +1 checkstyle 0m 28s trunk passed +1 mvnsite 1m 13s trunk passed +1 mvneclipse 0m 25s trunk passed +1 findbugs 1m 42s trunk passed +1 javadoc 0m 59s trunk passed 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 0m 55s the patch passed +1 compile 6m 47s the patch passed +1 javac 6m 47s the patch passed +1 checkstyle 0m 28s the patch passed +1 mvnsite 1m 16s the patch passed +1 mvneclipse 0m 26s the patch passed +1 whitespace 0m 1s The patch has no whitespace issues. -1 findbugs 1m 35s hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) +1 javadoc 1m 0s the patch passed +1 unit 7m 41s hadoop-common in the patch passed. +1 unit 2m 17s hadoop-kms in the patch passed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 46m 9s Reason Tests FindBugs module:hadoop-common-project/hadoop-common   Load of known null value in org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(URL, String) At KMSClientProvider.java:in org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(URL, String) At KMSClientProvider.java: [line 542] Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12819148/HADOOP-13381.02.patch JIRA Issue HADOOP-13381 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux dccc7992d7f2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 38128ba Default Java 1.8.0_91 findbugs v3.0.0 findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/10039/artifact/patchprocess/new-findbugs-hadoop-common-project_hadoop-common.html Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10039/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms U: hadoop-common-project Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10039/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          I extended my above thought further, and think that's the decent way to fix this.

          Attaching patch 2 to demonstrate the idea, I can also verify from end-to-end if we agree on this approach.

          Could you take a look, Arun Suresh? Thanks a lot!

          Show
          xiaochen Xiao Chen added a comment - I extended my above thought further, and think that's the decent way to fix this. Attaching patch 2 to demonstrate the idea, I can also verify from end-to-end if we agree on this approach. Could you take a look, Arun Suresh ? Thanks a lot!
          Hide
          xiaochen Xiao Chen added a comment -

          Thank you for the continued discussion, Arun.

          Sorry I missed 1 point in your proposal... it wouldn't work as we hoped.

          4. Then, we let the retry happen, at which point it will get a new delegation token.

          IIUC, the authToken was to cache past successful authentications (so we don't have to authenticate every time). It does not 'get a new delegation token'. Instead, it just gets the kms-dt from the UGI's current user inside DelegationTokenAuthenticatedURL#openConnection, which happens inside the actualUgi.doAs in KMSCP#createConnection. So retries will still see the same expired DT (or no DT at all if we remove it). We have to get the DT from UGI's current user before actualUgi.doAs... right?

          Let me elaborate on the race I was thinking:
          I did a test as follows:

          1. set /tmp as an EZ
          2. run a MR job (wordcount) as user mapred, over /tmp. Let's call this job1
          3. run a MR job (wordcount) as user impala, over /tmp. Let's call this job2.
          4. get below logs from my customized logging in KMSCP#createConnection
          2016-07-19 14:35:18,306 INFO org.apache.hadoop.crypto.key.kms.KMSClientProvider: ==== currentUGI:impala (auth:SIMPLE) creds: [Kind: kms-dt, Service: 172.31.9.35:16000, Ident: 00 06 69 6d 70 61 6c 61 04 79 61 72 6e 00 8a 01 56 05 15 10 22 8a 01 56 05 17 cf 42 02 02, Kind: mapreduce.job, Service: job_1468963667277_0002, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@2e951fb5), Kind: HDFS_DELEGATION_TOKEN, Service: 172.31.9.72:8020, Ident: (token for impala: HDFS_DELEGATION_TOKEN owner=impala@GCE.CLOUDERA.COM, renewer=yarn, realUser=, issueDate=1468964081478, maxDate=1468964381478, sequenceNumber=216, masterKeyId=20)]
          2016-07-19 14:35:18,307 INFO org.apache.hadoop.crypto.key.kms.KMSClientProvider: ==== actualUGI: mapred (auth:SIMPLE) creds: [Kind: kms-dt, Service: 172.31.9.35:16000, Ident: 00 06 6d 61 70 72 65 64 04 79 61 72 6e 00 8a 01 56 05 11 b5 db 8a 01 56 05 14 74 fb 01 02, Kind: mapreduce.job, Service: job_1468963667277_0001, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@7fdacda0), Kind: HDFS_DELEGATION_TOKEN, Service: 172.31.9.72:8020, Ident: (token for mapred: HDFS_DELEGATION_TOKEN owner=mapred@GCE.CLOUDERA.COM, renewer=yarn, realUser=, issueDate=1468963861782, maxDate=1468964161782, sequenceNumber=215, masterKeyId=20)]
          

          Note here the actual UGI is entirely mapred's. If job1 is about to actualUgi.doAs while job2 updated the credentials in actualUgi, job1 will then see job2's dt when the invocation goes into DTAURL..... right?

          My drive-home thinking is that we should doAs current ugi in this specific case (or retry with currentUGI).... Namely, when this is null.

          Show
          xiaochen Xiao Chen added a comment - Thank you for the continued discussion, Arun. Sorry I missed 1 point in your proposal... it wouldn't work as we hoped. 4. Then, we let the retry happen, at which point it will get a new delegation token. IIUC, the authToken was to cache past successful authentications (so we don't have to authenticate every time). It does not 'get a new delegation token'. Instead, it just gets the kms-dt from the UGI's current user inside DelegationTokenAuthenticatedURL#openConnection , which happens inside the actualUgi.doAs in KMSCP#createConnection . So retries will still see the same expired DT (or no DT at all if we remove it). We have to get the DT from UGI's current user before actualUgi.doAs... right? Let me elaborate on the race I was thinking: I did a test as follows: set /tmp as an EZ run a MR job (wordcount) as user mapred , over /tmp . Let's call this job1 run a MR job (wordcount) as user impala , over /tmp . Let's call this job2. get below logs from my customized logging in KMSCP#createConnection 2016-07-19 14:35:18,306 INFO org.apache.hadoop.crypto.key.kms.KMSClientProvider: ==== currentUGI:impala (auth:SIMPLE) creds: [Kind: kms-dt, Service: 172.31.9.35:16000, Ident: 00 06 69 6d 70 61 6c 61 04 79 61 72 6e 00 8a 01 56 05 15 10 22 8a 01 56 05 17 cf 42 02 02, Kind: mapreduce.job, Service: job_1468963667277_0002, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@2e951fb5), Kind: HDFS_DELEGATION_TOKEN, Service: 172.31.9.72:8020, Ident: (token for impala: HDFS_DELEGATION_TOKEN owner=impala@GCE.CLOUDERA.COM, renewer=yarn, realUser=, issueDate=1468964081478, maxDate=1468964381478, sequenceNumber=216, masterKeyId=20)] 2016-07-19 14:35:18,307 INFO org.apache.hadoop.crypto.key.kms.KMSClientProvider: ==== actualUGI: mapred (auth:SIMPLE) creds: [Kind: kms-dt, Service: 172.31.9.35:16000, Ident: 00 06 6d 61 70 72 65 64 04 79 61 72 6e 00 8a 01 56 05 11 b5 db 8a 01 56 05 14 74 fb 01 02, Kind: mapreduce.job, Service: job_1468963667277_0001, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@7fdacda0), Kind: HDFS_DELEGATION_TOKEN, Service: 172.31.9.72:8020, Ident: (token for mapred: HDFS_DELEGATION_TOKEN owner=mapred@GCE.CLOUDERA.COM, renewer=yarn, realUser=, issueDate=1468963861782, maxDate=1468964161782, sequenceNumber=215, masterKeyId=20)] Note here the actual UGI is entirely mapred's. If job1 is about to actualUgi.doAs while job2 updated the credentials in actualUgi , job1 will then see job2's dt when the invocation goes into DTAURL..... right? My drive-home thinking is that we should doAs current ugi in this specific case (or retry with currentUGI).... Namely, when this is null.
          Hide
          asuresh Arun Suresh added a comment -

          ..assuming we loosen the retry check of response message..

          Agreed... I guess that should be fine..

          With respect to the race condition, am not really worried.. the worst that can happen, if we follow the flow I specified in my earlier comment (when multiple threads call the same KMSClientProvider at a time when the DT has expired), is that.. simultaneous refreshes of the UGI's credentials will happen, but dont think there would be any UGI state inconsistency. Besides, the UGI::addCredential is synchronized on the subject.

          Show
          asuresh Arun Suresh added a comment - ..assuming we loosen the retry check of response message.. Agreed... I guess that should be fine.. With respect to the race condition, am not really worried.. the worst that can happen, if we follow the flow I specified in my earlier comment (when multiple threads call the same KMSClientProvider at a time when the DT has expired), is that.. simultaneous refreshes of the UGI's credentials will happen, but dont think there would be any UGI state inconsistency. Besides, the UGI::addCredential is synchronized on the subject.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Arun Suresh for the quick response! The flow you mentioned would work, assuming we loosen the retry check of response message (these lines), and add the remove token method to UGI.

          On the multi-thread side, did I miss anything? If many threads running in LogAggregationService try to do log aggregation, and end up with the same cached KMSCP, would this cause a race? IMO this problem exists before this patch, but maybe I missed something... I don't think the cached authToken work under this scenario.

          Show
          xiaochen Xiao Chen added a comment - Thanks Arun Suresh for the quick response! The flow you mentioned would work, assuming we loosen the retry check of response message ( these lines ), and add the remove token method to UGI. On the multi-thread side, did I miss anything? If many threads running in LogAggregationService try to do log aggregation, and end up with the same cached KMSCP, would this cause a race? IMO this problem exists before this patch, but maybe I missed something... I don't think the cached authToken work under this scenario.
          Hide
          asuresh Arun Suresh added a comment -

          So... I was thinking we should do the following:

          1. Ensure the NM creates the DFSclient on boot up, so that the acutalUgi is the yarn user
          2. Add a method in UserGroupInformation to remove credentials, so that you can remove the KMS-DT from the actualUgi.
          3. After the token has expired and when we get an authorization exception, we, in addition to flushing the authToken (line 592 in KMSClientProvider), we also call the new method I mentioned in the previous point to remove the KMS-DT.
          4. Then, we let the retry happen, at which point it will get a new delegation token.
            makes sense ?
          Show
          asuresh Arun Suresh added a comment - So... I was thinking we should do the following: Ensure the NM creates the DFSclient on boot up, so that the acutalUgi is the yarn user Add a method in UserGroupInformation to remove credentials, so that you can remove the KMS-DT from the actualUgi. After the token has expired and when we get an authorization exception, we, in addition to flushing the authToken (line 592 in KMSClientProvider), we also call the new method I mentioned in the previous point to remove the KMS-DT. Then, we let the retry happen, at which point it will get a new delegation token. makes sense ?
          Hide
          xiaochen Xiao Chen added a comment -

          the race that multiple threads calling the same cached KMSCP

          The problem becomes tougher when considering multi-thread.... The cached actualUgi is to handle proxy users, per HADOOP-10698 and HADOOP-11176, so we need that as initial UGI.

          For the DT case, we want to pass in the latest credentials. However, the DT-fetching always happens inside actualUgi.doAs, which is cached and not updated. I can see the race where more than 1 thread in comment #1 reaching the same KMSCP, and what we do here would be troublesome.

          Don't see a decent solution so far, need more thoughts... Feel free to speak up if any suggestions.

          Show
          xiaochen Xiao Chen added a comment - the race that multiple threads calling the same cached KMSCP The problem becomes tougher when considering multi-thread.... The cached actualUgi is to handle proxy users, per HADOOP-10698 and HADOOP-11176 , so we need that as initial UGI. For the DT case, we want to pass in the latest credentials. However, the DT-fetching always happens inside actualUgi.doAs , which is cached and not updated. I can see the race where more than 1 thread in comment #1 reaching the same KMSCP, and what we do here would be troublesome. Don't see a decent solution so far, need more thoughts... Feel free to speak up if any suggestions.
          Hide
          xiaochen Xiao Chen added a comment -

          I had an offline discussion with Arun Suresh, and here's the minute:

          • Arun brought up the point that there's authRetry in KMSCP, and when authToken is expired, a new DelegationTokenAuthenticatedURL.Token is created and the call is retried.
            This doesn't help in our case, since (code inside the call) the UGI's credentials are used to get the kms-dt, which would be the same expired token.
          • Regarding Yarn log aggregation, I explained that MR jobs will get tokens and run, and in the end NM will use that job's tokens to do Yarn log aggregation as a final MR job. So this part should be done as the MR user (as opposed to NM user: yarn), since this writes to the MR user's dir /tmp/logs/user/..... cc Robert Kanter in case anything I said is not accurate.
          • To minimize impact, we should only update kms-dt in the call.
          • Arun has a general concern on updating the actualUgi's token, since normal use case is doAs / proxy user. This could be enhanced in another jira.

          (My thought after the discussion): to counter the race that multiple threads calling the same cached KMSCP, we should create a new UGI object and update the tokens.
          Will update a patch with more details.

          Show
          xiaochen Xiao Chen added a comment - I had an offline discussion with Arun Suresh , and here's the minute: Arun brought up the point that there's authRetry in KMSCP, and when authToken is expired, a new DelegationTokenAuthenticatedURL.Token is created and the call is retried. This doesn't help in our case, since (code inside the call) the UGI's credentials are used to get the kms-dt, which would be the same expired token. Regarding Yarn log aggregation, I explained that MR jobs will get tokens and run, and in the end NM will use that job's tokens to do Yarn log aggregation as a final MR job. So this part should be done as the MR user (as opposed to NM user: yarn), since this writes to the MR user's dir /tmp/logs/user/.... . cc Robert Kanter in case anything I said is not accurate. To minimize impact, we should only update kms-dt in the call. Arun has a general concern on updating the actualUgi's token, since normal use case is doAs / proxy user. This could be enhanced in another jira. (My thought after the discussion): to counter the race that multiple threads calling the same cached KMSCP, we should create a new UGI object and update the tokens. Will update a patch with more details.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Arun Suresh for the response.

          Colin and Andrew can confirm on the ClientContext, but I think the problem here is the cached actualUgi inside the KMSClientProvier. Since the same client will get the provider from cache, the actualUgi in the cached provider is in turn cached, without updated credentials. Later, the DT is fetched out of the UGI's out-dated credentials (code).

          Show
          xiaochen Xiao Chen added a comment - Thanks Arun Suresh for the response. Colin and Andrew can confirm on the ClientContext , but I think the problem here is the cached actualUgi inside the KMSClientProvier . Since the same client will get the provider from cache, the actualUgi in the cached provider is in turn cached, without updated credentials. Later, the DT is fetched out of the UGI's out-dated credentials ( code ).
          Hide
          asuresh Arun Suresh added a comment -

          Thanks for the opening this Xiao Chen..
          I think your approach is definitely simpler than modifying the cache to key with a composite key of URI + ugi current user.
          But I think the bigger issue is that it looks like the ClientContext in which the KeyProviderCache is maintained is not a per user cache. Can you confirm Colin P. McCabe / Andrew Wang?

          Show
          asuresh Arun Suresh added a comment - Thanks for the opening this Xiao Chen .. I think your approach is definitely simpler than modifying the cache to key with a composite key of URI + ugi current user. But I think the bigger issue is that it looks like the ClientContext in which the KeyProviderCache is maintained is not a per user cache. Can you confirm Colin P. McCabe / Andrew Wang ?
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 33s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 7s Maven dependency ordering for branch
          +1 mvninstall 6m 58s trunk passed
          +1 compile 7m 21s trunk passed
          +1 checkstyle 0m 29s trunk passed
          +1 mvnsite 1m 18s trunk passed
          +1 mvneclipse 0m 26s trunk passed
          +1 findbugs 1m 43s trunk passed
          +1 javadoc 0m 59s trunk passed
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 0m 54s the patch passed
          +1 compile 6m 55s the patch passed
          +1 javac 6m 55s the patch passed
          +1 checkstyle 0m 50s the patch passed
          +1 mvnsite 1m 16s the patch passed
          +1 mvneclipse 0m 26s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 56s the patch passed
          +1 javadoc 0m 59s the patch passed
          +1 unit 6m 59s hadoop-common in the patch passed.
          +1 unit 2m 16s hadoop-kms in the patch passed.
          +1 asflicense 0m 24s The patch does not generate ASF License warnings.
          47m 5s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818255/HADOOP-13381.01.patch
          JIRA Issue HADOOP-13381
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux c550c20e093f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / ea9f437
          Default Java 1.8.0_91
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10010/testReport/
          modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms U: hadoop-common-project
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10010/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 33s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 7s Maven dependency ordering for branch +1 mvninstall 6m 58s trunk passed +1 compile 7m 21s trunk passed +1 checkstyle 0m 29s trunk passed +1 mvnsite 1m 18s trunk passed +1 mvneclipse 0m 26s trunk passed +1 findbugs 1m 43s trunk passed +1 javadoc 0m 59s trunk passed 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 0m 54s the patch passed +1 compile 6m 55s the patch passed +1 javac 6m 55s the patch passed +1 checkstyle 0m 50s the patch passed +1 mvnsite 1m 16s the patch passed +1 mvneclipse 0m 26s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 56s the patch passed +1 javadoc 0m 59s the patch passed +1 unit 6m 59s hadoop-common in the patch passed. +1 unit 2m 16s hadoop-kms in the patch passed. +1 asflicense 0m 24s The patch does not generate ASF License warnings. 47m 5s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12818255/HADOOP-13381.01.patch JIRA Issue HADOOP-13381 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c550c20e093f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ea9f437 Default Java 1.8.0_91 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/10010/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms U: hadoop-common-project Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10010/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          andrew.wang Andrew Wang added a comment -

          Arun Suresh do you think you can review this one? The fix is a one-liner, but the discussion here is mostly about YARN, and it looks like you wrote the original cache code.

          Show
          andrew.wang Andrew Wang added a comment - Arun Suresh do you think you can review this one? The fix is a one-liner, but the discussion here is mostly about YARN, and it looks like you wrote the original cache code.
          Hide
          xiaochen Xiao Chen added a comment -

          The KeyProviderCache is necessary according to HDFS-7718.

          To fix the issue, I can think of 2 options:

          • Change the cache to recognize different clients.
          • Update KMSClientProvider to favor new tokens.
            I chose option #2 because #1 would increase KeyProvider object based on client number, and even so we still need to update the tokens since after a MR job, a token may be explicitly cancelled.

          This is more of an end-to-end thing, but I tried to mimic it in TestKMS to keep it simple. Patch 1 attached.

          Thanks Robert Kanter again for helping me understand Yarn log aggregation! Ping Arun Suresh / Andrew Wang for review, thank you in advance.

          Show
          xiaochen Xiao Chen added a comment - The KeyProviderCache is necessary according to HDFS-7718 . To fix the issue, I can think of 2 options: Change the cache to recognize different clients. Update KMSClientProvider to favor new tokens. I chose option #2 because #1 would increase KeyProvider object based on client number, and even so we still need to update the tokens since after a MR job, a token may be explicitly cancelled. This is more of an end-to-end thing, but I tried to mimic it in TestKMS to keep it simple. Patch 1 attached. Thanks Robert Kanter again for helping me understand Yarn log aggregation! Ping Arun Suresh / Andrew Wang for review, thank you in advance.
          Hide
          xiaochen Xiao Chen added a comment -

          I figured the clearest way to explain this is to put the call stack:

          	at org.apache.hadoop.crypto.key.kms.KMSClientProvider.<init>(KMSClientProvider.java:461)
          	at org.apache.hadoop.crypto.key.kms.KMSClientProvider$Factory.createProvider(KMSClientProvider.java:331)
          	at org.apache.hadoop.crypto.key.kms.KMSClientProvider$Factory.createProvider(KMSClientProvider.java:322)
          	at org.apache.hadoop.crypto.key.KeyProviderFactory.get(KeyProviderFactory.java:95)
          	at org.apache.hadoop.util.KMSUtil.createKeyProvider(KMSUtil.java:65)
          	at org.apache.hadoop.hdfs.DFSUtil.createKeyProvider(DFSUtil.java:1851)
          	at org.apache.hadoop.hdfs.KeyProviderCache$2.call(KeyProviderCache.java:73)
          	at org.apache.hadoop.hdfs.KeyProviderCache$2.call(KeyProviderCache.java:70)
          	at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
          	at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
          	at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
          	at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
          	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
          	at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
          	at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
          	at org.apache.hadoop.hdfs.KeyProviderCache.get(KeyProviderCache.java:70)
          	at org.apache.hadoop.hdfs.DFSClient.getKeyProvider(DFSClient.java:3570)
          	at org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1408)
          	at org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1521)
          	at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:108)
          	at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:59)
          	at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
          	at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:683)
          	at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:679)
          	at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
          	at org.apache.hadoop.fs.FileContext.create(FileContext.java:679)
          	at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter$1.run(AggregatedLogFormat.java:385)
          	at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter$1.run(AggregatedLogFormat.java:380)
          	at java.security.AccessController.doPrivileged(Native Method)
          	at javax.security.auth.Subject.doAs(Subject.java:415)
          	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
          	at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.<init>(AggregatedLogFormat.java:379)
          	at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:246)
          	at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:456)
          	at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:421)
          	at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$2.run(LogAggregationService.java:386)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          	at java.lang.Thread.run(Thread.java:745)
          

          So there's a KeyProviderCache which caches the KeyProvider object by the configured URI. Meanwhile, KMSClientProvider has this actualUGI cached for the creator UGI. This is fine for transient clients, but problematic for long-running processes like Node Manager.
          When NM impersonate the client and use client's delegation token to run MR job, KMSClientProvider should favor client's DT, not the cached ones which may have long been expired.

          Show
          xiaochen Xiao Chen added a comment - I figured the clearest way to explain this is to put the call stack: at org.apache.hadoop.crypto.key.kms.KMSClientProvider.<init>(KMSClientProvider.java:461) at org.apache.hadoop.crypto.key.kms.KMSClientProvider$Factory.createProvider(KMSClientProvider.java:331) at org.apache.hadoop.crypto.key.kms.KMSClientProvider$Factory.createProvider(KMSClientProvider.java:322) at org.apache.hadoop.crypto.key.KeyProviderFactory.get(KeyProviderFactory.java:95) at org.apache.hadoop.util.KMSUtil.createKeyProvider(KMSUtil.java:65) at org.apache.hadoop.hdfs.DFSUtil.createKeyProvider(DFSUtil.java:1851) at org.apache.hadoop.hdfs.KeyProviderCache$2.call(KeyProviderCache.java:73) at org.apache.hadoop.hdfs.KeyProviderCache$2.call(KeyProviderCache.java:70) at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) at com.google.common.cache.LocalCache.get(LocalCache.java:3965) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764) at org.apache.hadoop.hdfs.KeyProviderCache.get(KeyProviderCache.java:70) at org.apache.hadoop.hdfs.DFSClient.getKeyProvider(DFSClient.java:3570) at org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1408) at org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1521) at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:108) at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:59) at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:683) at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:679) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.create(FileContext.java:679) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter$1.run(AggregatedLogFormat.java:385) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter$1.run(AggregatedLogFormat.java:380) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.<init>(AggregatedLogFormat.java:379) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:246) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:456) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:421) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$2.run(LogAggregationService.java:386) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) So there's a KeyProviderCache which caches the KeyProvider object by the configured URI. Meanwhile, KMSClientProvider has this actualUGI cached for the creator UGI. This is fine for transient clients, but problematic for long-running processes like Node Manager. When NM impersonate the client and use client's delegation token to run MR job, KMSClientProvider should favor client's DT, not the cached ones which may have long been expired.

            People

            • Assignee:
              xiaochen Xiao Chen
              Reporter:
              xiaochen Xiao Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development