Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14044

Synchronization issue in delegation token cancel functionality

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.2
    • Component/s: None
    • Labels:
      None

      Description

      We are using Hadoop delegation token authentication functionality in Apache Solr. As part of the integration testing, I found following issue with the delegation token cancelation functionality.

      Consider a setup with 2 Solr servers (S1 and S2) which are configured to use delegation token functionality backed by Zookeeper. Now invoke following steps,

      [Step 1] Send a request to S1 to create a delegation token.
      (Delegation token DT is created successfully)
      [Step 2] Send a request to cancel DT to S2
      (DT is canceled successfully. client receives HTTP 200 response)
      [Step 3] Send a request to cancel DT to S2 again
      (DT cancelation fails. client receives HTTP 404 response)
      [Step 4] Send a request to cancel DT to S1

      At this point we get two different responses.

      • DT cancelation fails. client receives HTTP 404 response
      • DT cancelation succeeds. client receives HTTP 200 response

      Also as per the current implementation, each server maintains an in_memory cache of current tokens which is updated using the ZK watch mechanism. e.g. the ZK watch on S1 will ensure that the in_memory cache is synchronized after step 2.

      After investigation, I found the root cause for this behavior is due to the race condition between step 4 and the firing of ZK watch on S1. Whenever the watch fires before the step 4 - we get HTTP 404 response (as expected). When that is not the case - we get HTTP 200 response along with following ERROR message in the log,

      Attempted to remove a non-existing znode /ZKDTSMTokensRoot/DT_XYZ
      

      From client perspective, the server should return HTTP 404 error when the cancel request is sent out for an invalid token.

      Ref: Here is the relevant Solr unit test for reference,
      https://github.com/apache/lucene-solr/blob/746786636404cdb8ce505ed0ed02b8d9144ab6c4/solr/core/src/test/org/apache/solr/cloud/TestSolrCloudWithDelegationTokens.java#L285

      1. HADOOP-14044-003.patch
        2 kB
        Hrishikesh Gadre
      2. HADOOP-14044-002.patch
        2 kB
        Hrishikesh Gadre
      3. HADOOP-14044-001.patch
        4 kB
        Hrishikesh Gadre
      4. dt_success.log
        4 kB
        Hrishikesh Gadre
      5. dt_fail.log
        5 kB
        Hrishikesh Gadre

        Issue Links

          Activity

          Hide
          hgadre Hrishikesh Gadre added a comment -

          I am attaching the two runs of the above mentioned unit test. Please note that the assertion message is a bit misleading. It basically means that the test expected to receive 404 response. But it received 200 response.

          Show
          hgadre Hrishikesh Gadre added a comment - I am attaching the two runs of the above mentioned unit test. Please note that the assertion message is a bit misleading. It basically means that the test expected to receive 404 response. But it received 200 response.
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Xiao Chen Arun Suresh Here is a patch which fixes this race condition. Can you please review this and let me know your feedback ?

          Show
          hgadre Hrishikesh Gadre added a comment - Xiao Chen Arun Suresh Here is a patch which fixes this race condition. Can you please review this and let me know your feedback ?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 12m 48s trunk passed
          +1 compile 13m 25s trunk passed
          +1 checkstyle 0m 30s trunk passed
          +1 mvnsite 1m 3s trunk passed
          +1 mvneclipse 0m 19s trunk passed
          +1 findbugs 1m 25s trunk passed
          +1 javadoc 0m 48s trunk passed
          +1 mvninstall 0m 36s the patch passed
          -1 compile 3m 8s root in the patch failed.
          -1 javac 3m 8s root in the patch failed.
          +1 checkstyle 0m 23s the patch passed
          +1 mvnsite 0m 52s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 39s the patch passed
          +1 javadoc 0m 42s the patch passed
          +1 unit 8m 52s hadoop-common in the patch passed.
          +1 asflicense 0m 22s The patch does not generate ASF License warnings.
          48m 47s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HADOOP-14044
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12850319/HADOOP-14044-001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 038987cc59ba 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 258991d
          Default Java 1.8.0_121
          findbugs v3.0.0
          compile https://builds.apache.org/job/PreCommit-HADOOP-Build/11549/artifact/patchprocess/patch-compile-root.txt
          javac https://builds.apache.org/job/PreCommit-HADOOP-Build/11549/artifact/patchprocess/patch-compile-root.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11549/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11549/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 12m 48s trunk passed +1 compile 13m 25s trunk passed +1 checkstyle 0m 30s trunk passed +1 mvnsite 1m 3s trunk passed +1 mvneclipse 0m 19s trunk passed +1 findbugs 1m 25s trunk passed +1 javadoc 0m 48s trunk passed +1 mvninstall 0m 36s the patch passed -1 compile 3m 8s root in the patch failed. -1 javac 3m 8s root in the patch failed. +1 checkstyle 0m 23s the patch passed +1 mvnsite 0m 52s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 39s the patch passed +1 javadoc 0m 42s the patch passed +1 unit 8m 52s hadoop-common in the patch passed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 48m 47s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HADOOP-14044 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12850319/HADOOP-14044-001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 038987cc59ba 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 258991d Default Java 1.8.0_121 findbugs v3.0.0 compile https://builds.apache.org/job/PreCommit-HADOOP-Build/11549/artifact/patchprocess/patch-compile-root.txt javac https://builds.apache.org/job/PreCommit-HADOOP-Build/11549/artifact/patchprocess/patch-compile-root.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11549/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11549/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Hrishikesh Gadre for reporting the issue and providing a fix.

          As discussed offline, this behavior from hadoop is non-deterministic currently. I think it logically makes sense to allow the caller a way to know whether the cancellation succeeded.

          The only problem with AbstractDelegationTokenSecretManager is it's Public Evolving. I can't seem to find an explicit saying in Hadoop's compatibility guideline regarding protected methods. But this also could break binary compatibility if some outside jars are compiled with the old version (thanks Andrew Wang for pointing this out!). So I'm afraid the fix in patch 1 can't be committed, because we want to provide binary compat. even across major releases...

          Other options I can think of:

          • change ZKDTSM to throw instead of return false. This would be an incompatible behavior.
          • find a way to guarantee all peer ZKDTSMs see the removal, before returning success for the cancellation. This is against the current ZKDTSM architecture where each ZKDTSM isn't peer-aware

          So it seems there's no good way to satisfy your request of 'when 2 callers are cancelling, exactly 1 should see success'. I guess this may end up inline with zookeeper's documentation - client has to handle it.

          Please feel free to share your thoughts... thanks.

          Show
          xiaochen Xiao Chen added a comment - Thanks Hrishikesh Gadre for reporting the issue and providing a fix. As discussed offline, this behavior from hadoop is non-deterministic currently. I think it logically makes sense to allow the caller a way to know whether the cancellation succeeded. The only problem with AbstractDelegationTokenSecretManager is it's Public Evolving . I can't seem to find an explicit saying in Hadoop's compatibility guideline regarding protected methods. But this also could break binary compatibility if some outside jars are compiled with the old version (thanks Andrew Wang for pointing this out!). So I'm afraid the fix in patch 1 can't be committed, because we want to provide binary compat. even across major releases... Other options I can think of: change ZKDTSM to throw instead of return false. This would be an incompatible behavior. find a way to guarantee all peer ZKDTSMs see the removal, before returning success for the cancellation. This is against the current ZKDTSM architecture where each ZKDTSM isn't peer-aware So it seems there's no good way to satisfy your request of 'when 2 callers are cancelling, exactly 1 should see success'. I guess this may end up inline with zookeeper's documentation - client has to handle it. Please feel free to share your thoughts... thanks.
          Hide
          hgadre Hrishikesh Gadre added a comment - - edited

          Xiao Chen Thanks for the feedback. The patch-1 was just a starting point for the discussion (it doesn't compile fully if you see the build output ).

          So it seems there's no good way to satisfy your request of 'when 2 callers are cancelling, exactly 1 should see success'. I guess this may end up inline with zookeeper's documentation - client has to handle it.

          I don't think that is quite accurate. I understand that if 2 callers invoke cancel operation concurrently then it may not be possible to figure out which client actually deleted the token. But in my case there is just a single caller invoking cancel operation one after another. Hence I think it is a case of application level cache inconsistency which is being exposed to the client.

          If we don't want to binary compatibility at the API level, would it be ok to change the API semantics ? e.g. an alternative would be for the server to return HTTP 200 in all cases (e.g. instead of sending 404 in case of nonexistent token). This will ensure that server provides consistent response to client regardless of cache inconsistency at the server. The REST API semantics for DELETE method also seem to favor this approach.

          What do you think?

          Show
          hgadre Hrishikesh Gadre added a comment - - edited Xiao Chen Thanks for the feedback. The patch-1 was just a starting point for the discussion (it doesn't compile fully if you see the build output ). So it seems there's no good way to satisfy your request of 'when 2 callers are cancelling, exactly 1 should see success'. I guess this may end up inline with zookeeper's documentation - client has to handle it. I don't think that is quite accurate. I understand that if 2 callers invoke cancel operation concurrently then it may not be possible to figure out which client actually deleted the token. But in my case there is just a single caller invoking cancel operation one after another. Hence I think it is a case of application level cache inconsistency which is being exposed to the client. If we don't want to binary compatibility at the API level, would it be ok to change the API semantics ? e.g. an alternative would be for the server to return HTTP 200 in all cases (e.g. instead of sending 404 in case of nonexistent token). This will ensure that server provides consistent response to client regardless of cache inconsistency at the server. The REST API semantics for DELETE method also seem to favor this approach. What do you think?
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Hrishikesh Gadre. I agree this behavior isn't elegant, but the issue is tricky.

          Your idea of 200 for nonexistent token makes sense to me. But looking at the code it's not straightforward to check.

          The only place to return 404 is this - any IOE thrown when trying to cancel.

          Looking into that code, DelegationTokenManager#cancelToken has 2 steps in general: 1. verifyToken, and 2.cancelToken.
          It seems to me your success log caught the node removal at step 1, hence the No node in path messages from here, and an InvalidToken exception (and 404). The failure log didn't catch this, and went on into step 2, eventually reaching the ZKDTSM#removeStoredToken as we talked, and ended up with the message Attempted to remove a non-existing znode and returned null (and 200).

          Assuming the above understanding is right, I think this ends up being the problem that DelegationTokenManager#cancelToken doesn't provide atomicity in the underlying secret manager. ADTSM's verifyToken and cancelToken is synchronized, but things can change between step1 and step2.

          Given the level of callbacks between ADTSM and ZKDTSM, I think we have 2 directions moving forward:

          • carefully find a way to fix the above problem.
          • document this behavior. (It's also interesting that verifyToken only happens when the canceler passed into cancelToken is null btw...)

          Thoughts?

          Show
          xiaochen Xiao Chen added a comment - Thanks Hrishikesh Gadre . I agree this behavior isn't elegant, but the issue is tricky. Your idea of 200 for nonexistent token makes sense to me. But looking at the code it's not straightforward to check. The only place to return 404 is this - any IOE thrown when trying to cancel. Looking into that code, DelegationTokenManager#cancelToken has 2 steps in general: 1. verifyToken , and 2. cancelToken . It seems to me your success log caught the node removal at step 1, hence the No node in path messages from here , and an InvalidToken exception (and 404). The failure log didn't catch this, and went on into step 2, eventually reaching the ZKDTSM#removeStoredToken as we talked, and ended up with the message Attempted to remove a non-existing znode and returned null (and 200). Assuming the above understanding is right, I think this ends up being the problem that DelegationTokenManager#cancelToken doesn't provide atomicity in the underlying secret manager. ADTSM's verifyToken and cancelToken is synchronized, but things can change between step1 and step2. Given the level of callbacks between ADTSM and ZKDTSM, I think we have 2 directions moving forward: carefully find a way to fix the above problem. document this behavior. (It's also interesting that verifyToken only happens when the canceler passed into cancelToken is null btw...) Thoughts?
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Xiao Chen Thanks for feedback. After reviewing the code, I see another alternative which should fix this problem without breaking the compatibility requirements.

          Currently in ZKDelegationTokenSecretManager#cancelToken method, we attempt to load the token from ZK if it is not available in_memory cache. We invoke the base class cancelToken method regardless the token was available in ZK or not. The fix would be to catch the NODE_NOT_FOUND error explicitly and return 404 error.

          https://github.com/apache/hadoop/blob/b6f290d5b660ad157c7076767c619d02b3d0f894/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/ZKDelegationTokenSecretManager.java#L854-L858

          Would this be OK ?

          Show
          hgadre Hrishikesh Gadre added a comment - Xiao Chen Thanks for feedback. After reviewing the code, I see another alternative which should fix this problem without breaking the compatibility requirements. Currently in ZKDelegationTokenSecretManager#cancelToken method, we attempt to load the token from ZK if it is not available in_memory cache. We invoke the base class cancelToken method regardless the token was available in ZK or not. The fix would be to catch the NODE_NOT_FOUND error explicitly and return 404 error. https://github.com/apache/hadoop/blob/b6f290d5b660ad157c7076767c619d02b3d0f894/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/ZKDelegationTokenSecretManager.java#L854-L858 Would this be OK ?
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Hrishikesh Gadre, I think this is the right direction, nice job.
          Since getTokenInfo checks in-memory then queries ZK, and is again from that Public Evolving ADTSM, throwing may not be as clean.

          But I think we can replace this logic in cancelToken with a new one:

              try {
                if (!currentTokens.containsKey(id)) {
                  // See if token can be retrieved and placed in currentTokens
                  getTokenInfo(id);
                }
                return super.cancelToken(token, canceller);
              } catch (Exception e) {
                LOG.error("Exception while checking if token exist !!", e);
                return id;
              }
          

          We can always check if token exists in ZK, IOE if not. This would give us the 404 you wanted. If it exists, we proceed to add it to the currentTokens.

          I would say this behavior change is a bug fix, so compatible. Only draw back is we have to check ZK for each token cancellation, so more load on ZK and some perf decrease on the cancellation. But don't think this is performance sensitive so trade off for correctness seems worthy.

          Show
          xiaochen Xiao Chen added a comment - Thanks Hrishikesh Gadre , I think this is the right direction, nice job. Since getTokenInfo checks in-memory then queries ZK, and is again from that Public Evolving ADTSM, throwing may not be as clean. But I think we can replace this logic in cancelToken with a new one: try { if (!currentTokens.containsKey(id)) { // See if token can be retrieved and placed in currentTokens getTokenInfo(id); } return super .cancelToken(token, canceller); } catch (Exception e) { LOG.error( "Exception while checking if token exist !!" , e); return id; } We can always check if token exists in ZK, IOE if not. This would give us the 404 you wanted. If it exists, we proceed to add it to the currentTokens . I would say this behavior change is a bug fix, so compatible. Only draw back is we have to check ZK for each token cancellation, so more load on ZK and some perf decrease on the cancellation. But don't think this is performance sensitive so trade off for correctness seems worthy.
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Xiao Chen Thanks for the feedback. Here is a patch implementing this approach. I verified this patch manually on a real cluster. For this I had to disable Zookeeper watch to ensure that the inconsistency between local cache and the ZK state can be reproduced.

          Please take a look and let me have your feedback. Due to concurrency issue, I think it would be difficult to write a unit test for this scenario.

          Show
          hgadre Hrishikesh Gadre added a comment - Xiao Chen Thanks for the feedback. Here is a patch implementing this approach. I verified this patch manually on a real cluster. For this I had to disable Zookeeper watch to ensure that the inconsistency between local cache and the ZK state can be reproduced. Please take a look and let me have your feedback. Due to concurrency issue, I think it would be difficult to write a unit test for this scenario.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 14m 49s trunk passed
          +1 compile 13m 36s trunk passed
          +1 checkstyle 0m 29s trunk passed
          +1 mvnsite 1m 12s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 1m 32s trunk passed
          +1 javadoc 0m 49s trunk passed
          +1 mvninstall 0m 39s the patch passed
          +1 compile 10m 59s the patch passed
          +1 javac 10m 59s the patch passed
          +1 checkstyle 0m 28s the patch passed
          +1 mvnsite 1m 0s the patch passed
          +1 mvneclipse 0m 19s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 36s the patch passed
          +1 javadoc 0m 52s the patch passed
          -1 unit 8m 7s hadoop-common in the patch failed.
          +1 asflicense 0m 36s The patch does not generate ASF License warnings.
          59m 32s



          Reason Tests
          Failed junit tests hadoop.ha.TestZKFailoverController



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HADOOP-14044
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12850664/HADOOP-14044-002.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 500b91fa3ec3 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 0914fcc
          Default Java 1.8.0_121
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/11563/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11563/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11563/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 14m 49s trunk passed +1 compile 13m 36s trunk passed +1 checkstyle 0m 29s trunk passed +1 mvnsite 1m 12s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 1m 32s trunk passed +1 javadoc 0m 49s trunk passed +1 mvninstall 0m 39s the patch passed +1 compile 10m 59s the patch passed +1 javac 10m 59s the patch passed +1 checkstyle 0m 28s the patch passed +1 mvnsite 1m 0s the patch passed +1 mvneclipse 0m 19s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 36s the patch passed +1 javadoc 0m 52s the patch passed -1 unit 8m 7s hadoop-common in the patch failed. +1 asflicense 0m 36s The patch does not generate ASF License warnings. 59m 32s Reason Tests Failed junit tests hadoop.ha.TestZKFailoverController Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HADOOP-14044 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12850664/HADOOP-14044-002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 500b91fa3ec3 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 0914fcc Default Java 1.8.0_121 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HADOOP-Build/11563/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11563/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11563/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks for the patch Hrishikesh Gadre and the manual tests, looks pretty good to me. I agree it's not worth to write a unit test, since that would introduce a whole lot of test controls (or too many mocks....).

          Suggest to add synchronized keyword to the new syncLocalCacheWithZkState, since we're handling currentTokens there. I understand the current code is fine because the caller method cancelToken is synchronized, but adding it would be more future proof.

          And a super trivial nit: I think we can just name the new method syncLocalCacheWithZk.

          Show
          xiaochen Xiao Chen added a comment - Thanks for the patch Hrishikesh Gadre and the manual tests, looks pretty good to me. I agree it's not worth to write a unit test, since that would introduce a whole lot of test controls (or too many mocks....). Suggest to add synchronized keyword to the new syncLocalCacheWithZkState , since we're handling currentTokens there. I understand the current code is fine because the caller method cancelToken is synchronized, but adding it would be more future proof. And a super trivial nit: I think we can just name the new method syncLocalCacheWithZk .
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Xiao Chen Thanks for the review. Here is an updated patch which addresses these comments.

          Show
          hgadre Hrishikesh Gadre added a comment - Xiao Chen Thanks for the review. Here is an updated patch which addresses these comments.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 13m 31s trunk passed
          +1 compile 14m 22s trunk passed
          +1 checkstyle 0m 29s trunk passed
          +1 mvnsite 1m 5s trunk passed
          +1 mvneclipse 0m 18s trunk passed
          +1 findbugs 1m 30s trunk passed
          +1 javadoc 0m 49s trunk passed
          +1 mvninstall 0m 39s the patch passed
          +1 compile 11m 37s the patch passed
          +1 javac 11m 37s the patch passed
          +1 checkstyle 0m 29s the patch passed
          +1 mvnsite 1m 2s the patch passed
          +1 mvneclipse 0m 17s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 38s the patch passed
          +1 javadoc 0m 50s the patch passed
          +1 unit 7m 55s hadoop-common in the patch passed.
          +1 asflicense 0m 33s The patch does not generate ASF License warnings.
          59m 12s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HADOOP-14044
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12850725/HADOOP-14044-003.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 57fe2804ea40 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 0914fcc
          Default Java 1.8.0_121
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11566/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11566/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 13m 31s trunk passed +1 compile 14m 22s trunk passed +1 checkstyle 0m 29s trunk passed +1 mvnsite 1m 5s trunk passed +1 mvneclipse 0m 18s trunk passed +1 findbugs 1m 30s trunk passed +1 javadoc 0m 49s trunk passed +1 mvninstall 0m 39s the patch passed +1 compile 11m 37s the patch passed +1 javac 11m 37s the patch passed +1 checkstyle 0m 29s the patch passed +1 mvnsite 1m 2s the patch passed +1 mvneclipse 0m 17s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 38s the patch passed +1 javadoc 0m 50s the patch passed +1 unit 7m 55s hadoop-common in the patch passed. +1 asflicense 0m 33s The patch does not generate ASF License warnings. 59m 12s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HADOOP-14044 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12850725/HADOOP-14044-003.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 57fe2804ea40 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 0914fcc Default Java 1.8.0_121 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11566/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11566/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Hrishikesh Gadre for revving! +1 on patch 3, will commit end of Friday.

          Could you please set target version? 2.9.0 for branch-2, 3.0.0-alpha3 for trunk. Please check http://hadoop.apache.org/releases.html for earlier versions.

          Show
          xiaochen Xiao Chen added a comment - Thanks Hrishikesh Gadre for revving! +1 on patch 3, will commit end of Friday. Could you please set target version? 2.9.0 for branch-2, 3.0.0-alpha3 for trunk. Please check http://hadoop.apache.org/releases.html for earlier versions.
          Hide
          xiaochen Xiao Chen added a comment -

          Test failure look unrelated and passes locally. Also ran TestKMS in hadoop-kms and TestDelegationtoken in hadoop-hdfs, both passed.

          Committed trunk and branch-2, thanks for reporting and fixing the issue, Hrishikesh Gadre, and nice discussions!

          Show
          xiaochen Xiao Chen added a comment - Test failure look unrelated and passes locally. Also ran TestKMS in hadoop-kms and TestDelegationtoken in hadoop-hdfs, both passed. Committed trunk and branch-2, thanks for reporting and fixing the issue, Hrishikesh Gadre , and nice discussions!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11209 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11209/)
          HADOOP-14044. Synchronization issue in delegation token cancel (xiao: rev ba75bc759334c8987e5f7dd4b21d025f0cccbde7)

          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/ZKDelegationTokenSecretManager.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11209 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11209/ ) HADOOP-14044 . Synchronization issue in delegation token cancel (xiao: rev ba75bc759334c8987e5f7dd4b21d025f0cccbde7) (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/ZKDelegationTokenSecretManager.java
          Hide
          vinayrpet Vinayakumar B added a comment -

          This could be branch-2.8 and branch-2.7 candidate as well.

          Show
          vinayrpet Vinayakumar B added a comment - This could be branch-2.8 and branch-2.7 candidate as well.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Vinayakumar B for the suggestion.

          Just pushed to branch-2.8 and branch-2.7. The cherry-pick was clean. Compiled + ran TestZKDelegationTokenSecretManager locally before pushing.

          Show
          xiaochen Xiao Chen added a comment - Thanks Vinayakumar B for the suggestion. Just pushed to branch-2.8 and branch-2.7. The cherry-pick was clean. Compiled + ran TestZKDelegationTokenSecretManager locally before pushing.

            People

            • Assignee:
              hgadre Hrishikesh Gadre
              Reporter:
              hgadre Hrishikesh Gadre
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development