Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-6222

Remove background token renewer from webhdfs

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha, 3.0.0
    • Fix Version/s: 2.5.0
    • Component/s: webhdfs
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      The background token renewer is a source of problems for long-running daemons. Webhdfs should lazy fetch a new token when it receives an InvalidToken exception.

      1. HDFS-6222.branch-2.patch
        23 kB
        Daryn Sharp
      2. HDFS-6222.branch-2.patch
        23 kB
        Daryn Sharp
      3. HDFS-6222.branch-2-v2.patch
        28 kB
        Rushabh S Shah
      4. HDFS-6222.branch-2-v3.patch
        28 kB
        Rushabh S Shah
      5. HDFS-6222.trunk.patch
        23 kB
        Daryn Sharp
      6. HDFS-6222.trunk.patch
        23 kB
        Daryn Sharp
      7. HDFS-6222.trunk-v2.patch
        28 kB
        Rushabh S Shah
      8. HDFS-6222.trunk-v2.patch
        28 kB
        Rushabh S Shah
      9. HDFS-6222.trunk-v3.patch
        27 kB
        Rushabh S Shah

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1808 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1808/)
          HDFS-6222. Remove background token renewer from webhdfs. Contributed by Rushabh Shah and Daryn Sharp. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1604300)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenIdentifier.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/SWebHdfsFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenIdentifier
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTokens.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1808 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1808/ ) HDFS-6222 . Remove background token renewer from webhdfs. Contributed by Rushabh Shah and Daryn Sharp. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1604300 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenIdentifier.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/SWebHdfsFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenIdentifier /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTokens.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #1781 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1781/)
          HDFS-6222. Remove background token renewer from webhdfs. Contributed by Rushabh Shah and Daryn Sharp. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1604300)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenIdentifier.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/SWebHdfsFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenIdentifier
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTokens.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1781 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1781/ ) HDFS-6222 . Remove background token renewer from webhdfs. Contributed by Rushabh Shah and Daryn Sharp. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1604300 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenIdentifier.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/SWebHdfsFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenIdentifier /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTokens.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #590 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/590/)
          HDFS-6222. Remove background token renewer from webhdfs. Contributed by Rushabh Shah and Daryn Sharp. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1604300)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenIdentifier.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/SWebHdfsFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenIdentifier
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTokens.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #590 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/590/ ) HDFS-6222 . Remove background token renewer from webhdfs. Contributed by Rushabh Shah and Daryn Sharp. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1604300 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenIdentifier.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/SWebHdfsFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenIdentifier /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTokens.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #5747 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5747/)
          HDFS-6222. Remove background token renewer from webhdfs. Contributed by Rushabh Shah and Daryn Sharp. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1604300)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenIdentifier.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/SWebHdfsFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenIdentifier
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTokens.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #5747 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5747/ ) HDFS-6222 . Remove background token renewer from webhdfs. Contributed by Rushabh Shah and Daryn Sharp. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1604300 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenIdentifier.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/SWebHdfsFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/META-INF/services/org.apache.hadoop.security.token.TokenIdentifier /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTokens.java
          Hide
          Chris Nauroth added a comment -

          I have committed this to trunk and branch-2. I co-credited the path to both Rushabh S Shah and Daryn Sharp. Daryn, thank you for the initial patch. Rushabh, thank you for incorporating the code review feedback.

          Show
          Chris Nauroth added a comment - I have committed this to trunk and branch-2. I co-credited the path to both Rushabh S Shah and Daryn Sharp . Daryn, thank you for the initial patch. Rushabh, thank you for incorporating the code review feedback.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12651716/HDFS-6222.branch-2-v3.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7191//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651716/HDFS-6222.branch-2-v3.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7191//console This message is automatically generated.
          Hide
          Rushabh S Shah added a comment -

          Thanks Chris for the review.
          Please find attached the revised patch for branch-2.

          Show
          Rushabh S Shah added a comment - Thanks Chris for the review. Please find attached the revised patch for branch-2.
          Hide
          Chris Nauroth added a comment -

          +1 for the patch. Rushabh, thank you for addressing the feedback.

          Can you please also provide an up-to-date branch-2 patch? I'd be happy to take care of committing this for you.

          Show
          Chris Nauroth added a comment - +1 for the patch. Rushabh, thank you for addressing the feedback. Can you please also provide an up-to-date branch-2 patch? I'd be happy to take care of committing this for you.
          Hide
          Rushabh S Shah added a comment -

          The failed test runs fine on my machine.

          Show
          Rushabh S Shah added a comment - The failed test runs fine on my machine.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12651662/HDFS-6222.trunk-v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7189//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7189//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651662/HDFS-6222.trunk-v3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7189//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7189//console This message is automatically generated.
          Hide
          Rushabh S Shah added a comment -

          Fix findbugs warnings

          Show
          Rushabh S Shah added a comment - Fix findbugs warnings
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12651547/HDFS-6222.trunk-v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7182//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7182//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7182//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651547/HDFS-6222.trunk-v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7182//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/7182//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7182//console This message is automatically generated.
          Hide
          Rushabh S Shah added a comment -

          Uploading the trunk patch only.

          Show
          Rushabh S Shah added a comment - Uploading the trunk patch only.
          Hide
          Rushabh S Shah added a comment -

          Uploaded patches for branch-2 and trunk.
          Jenkins took the branch-2 patch and commented that it could not apply the patch against trunk revision.
          So cancelling the patch.

          Show
          Rushabh S Shah added a comment - Uploaded patches for branch-2 and trunk. Jenkins took the branch-2 patch and commented that it could not apply the patch against trunk revision. So cancelling the patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12651519/HDFS-6222.branch-2-v2.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7180//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651519/HDFS-6222.branch-2-v2.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7180//console This message is automatically generated.
          Hide
          Rushabh S Shah added a comment -

          Added a bunch of assert statements for token kind and added a test case for SWebHdfsFileSystem

          Show
          Rushabh S Shah added a comment - Added a bunch of assert statements for token kind and added a test case for SWebHdfsFileSystem
          Hide
          Rushabh S Shah added a comment -

          On behalf of daryn:
          Cancelling this patch in order to incorporate Chris's comments

          Show
          Rushabh S Shah added a comment - On behalf of daryn: Cancelling this patch in order to incorporate Chris's comments
          Hide
          Chris Nauroth added a comment -

          Thanks for the response. This all makes sense.

          A secure client is supposed to be able to talk to an insecure server...

          Now I get it. Would you mind changing that comment to "security is disabled on server" to make this clearer?

          Thanks for checking earlier versions for the compatibility concern.

          TokenAspect is still used by hftp...

          I forgot about this. I had been looking at trunk, where HftpFileSystem is gone.

          I'm not sure how to test swebhdfs...

          There is an example in TestHttpsFileSystem, and there are some helper methods that take care of bootstrapping configuration, including SSL certs.

          Show
          Chris Nauroth added a comment - Thanks for the response. This all makes sense. A secure client is supposed to be able to talk to an insecure server... Now I get it. Would you mind changing that comment to "security is disabled on server" to make this clearer? Thanks for checking earlier versions for the compatibility concern. TokenAspect is still used by hftp... I forgot about this. I had been looking at trunk, where HftpFileSystem is gone. I'm not sure how to test swebhdfs... There is an example in TestHttpsFileSystem , and there are some helper methods that take care of bootstrapping configuration, including SSL certs.
          Hide
          Daryn Sharp added a comment -

          Good questions. This design is the result of problems encountered by converting a mission critical production system to use webhdfs. We've been internally running in production for months with this change on 0.23, and a sandbox 2.x grid. A few of the issues: The renewer is hardcoded to assume 24h, which isn't a guarantee by any means. The filesystem can go dead for up to a day. Decreasing the token renewal on our QA clusters to 30s to stress token handling obviously didn't work either... We've also encountered class loader leaks. Filesystems would become unusable if the token expired, erroneously cancelled, or transient renewal failures such as during a NN restart.

          1. A secure client is supposed to be able to talk to an insecure server which is why earlier logic had this same behavior. Regarding malformed responses, NPEs used to be generated, not null returns. My earlier work trapped and converted the NPEs, which in this case will trigger the retry loop. Unlike the current implementation, the fs will attempt to re-acquire a token even after one operation fails which prevents the fs from becoming unusable -
          2. Yes, very unfortunate, but I only did it for backwards compatibility with NNs, and also cross-compatibility with DNs that don't munge the token exception. I checked earlier versions and it appears to have always been this way.
          3. True. We've become very performance conscience, but token renewal is infrequent if ever required by non-daemons so I consider the tiny latency worth the robustness.
          4. TokenAspect is still used by hftp or I would have happily removed it...
          5. I think this was covered by other tests. I'll double check and add if necessary. I'm not sure how to test swebhdfs since it requires extra configuration and ssl certs to function...
          Show
          Daryn Sharp added a comment - Good questions. This design is the result of problems encountered by converting a mission critical production system to use webhdfs. We've been internally running in production for months with this change on 0.23, and a sandbox 2.x grid. A few of the issues: The renewer is hardcoded to assume 24h, which isn't a guarantee by any means. The filesystem can go dead for up to a day. Decreasing the token renewal on our QA clusters to 30s to stress token handling obviously didn't work either... We've also encountered class loader leaks. Filesystems would become unusable if the token expired, erroneously cancelled, or transient renewal failures such as during a NN restart. A secure client is supposed to be able to talk to an insecure server which is why earlier logic had this same behavior. Regarding malformed responses, NPEs used to be generated, not null returns. My earlier work trapped and converted the NPEs, which in this case will trigger the retry loop. Unlike the current implementation, the fs will attempt to re-acquire a token even after one operation fails which prevents the fs from becoming unusable - Yes, very unfortunate, but I only did it for backwards compatibility with NNs, and also cross-compatibility with DNs that don't munge the token exception. I checked earlier versions and it appears to have always been this way. True. We've become very performance conscience, but token renewal is infrequent if ever required by non-daemons so I consider the tiny latency worth the robustness. TokenAspect is still used by hftp or I would have happily removed it... I think this was covered by other tests. I'll double check and add if necessary. I'm not sure how to test swebhdfs since it requires extra configuration and ssl certs to function...
          Hide
          Chris Nauroth added a comment -

          Hi, Daryn Sharp. Can you describe what kinds of problems you've seen "for long-running daemons"? Is there a race condition where the renewer thread might not do the renewal ahead of the client operation that needs it? Something else?

          A couple of questions/observations:

          1. I have a question on this logic:
                  } else {
                    token = getDelegationToken(null);
                    if (token != null) {
                      LOG.debug("Fetched new token: " + token);
                    } else { // security is disabled
                      canRefreshDelegationToken = false;
                    }
                  }
            

            In the second else block, I don't see how we can reach this code path when security is disabled, because it's already guarded by an earlier check of canRefreshDelegationToken, which is initialized from UserGroupInformation#isSecurityEnabled. Also, I seem to recall you mentioned to me offline seeing certain failures where a malformed response from the server can cause the JSON parser to return a null object. If that happens, then this logic would turn off future attempts to refresh. Would that effectively leave the file system instance in an unusable state, with the only workaround being to discard the instance and get a new one?

          2. It's unfortunate that we have to parse the exception strings, but I don't see an alternative. Do you know if these exception messages are consistent across prior versions? Is there any chance that a 2.5.0 WebHdfsFileSystem instance would be incompatible with servers running earlier versions back to 2.2.0?
          3. The latency of token renewal is now synchronous with the client operation. It's probably no big deal if your internal testing on secure clusters didn't show a problem.
          4. Unless I'm misreading, it seems like much of TokenAspect has become dead code. Is there a need for further clean-up?
          5. Thanks for adding those tests. They're very thorough. Do you think we need to add assertions that we're getting the correct token kind? Is it also worthwhile to add some tests through swebhdfs too, and assert that they get SWebHdfsFileSystem#TOKEN_KIND?
          Show
          Chris Nauroth added a comment - Hi, Daryn Sharp . Can you describe what kinds of problems you've seen "for long-running daemons"? Is there a race condition where the renewer thread might not do the renewal ahead of the client operation that needs it? Something else? A couple of questions/observations: I have a question on this logic: } else { token = getDelegationToken( null ); if (token != null ) { LOG.debug( "Fetched new token: " + token); } else { // security is disabled canRefreshDelegationToken = false ; } } In the second else block, I don't see how we can reach this code path when security is disabled, because it's already guarded by an earlier check of canRefreshDelegationToken , which is initialized from UserGroupInformation#isSecurityEnabled . Also, I seem to recall you mentioned to me offline seeing certain failures where a malformed response from the server can cause the JSON parser to return a null object. If that happens, then this logic would turn off future attempts to refresh. Would that effectively leave the file system instance in an unusable state, with the only workaround being to discard the instance and get a new one? It's unfortunate that we have to parse the exception strings, but I don't see an alternative. Do you know if these exception messages are consistent across prior versions? Is there any chance that a 2.5.0 WebHdfsFileSystem instance would be incompatible with servers running earlier versions back to 2.2.0? The latency of token renewal is now synchronous with the client operation. It's probably no big deal if your internal testing on secure clusters didn't show a problem. Unless I'm misreading, it seems like much of TokenAspect has become dead code. Is there a need for further clean-up? Thanks for adding those tests. They're very thorough. Do you think we need to add assertions that we're getting the correct token kind? Is it also worthwhile to add some tests through swebhdfs too, and assert that they get SWebHdfsFileSystem#TOKEN_KIND ?
          Hide
          Daryn Sharp added a comment -

          The test failure is unrelated.

          Show
          Daryn Sharp added a comment - The test failure is unrelated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12647613/HDFS-6222.trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7014//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7014//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647613/HDFS-6222.trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7014//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7014//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          Fix findbugs warnings.

          Show
          Daryn Sharp added a comment - Fix findbugs warnings.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12646923/HDFS-6222.trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6980//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6980//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6980//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646923/HDFS-6222.trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6980//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6980//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6980//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          Adds lazy re-refetch of expired tokens. Internally tested on secure clusters. Only difference in patches is a conflicting import.

          Show
          Daryn Sharp added a comment - Adds lazy re-refetch of expired tokens. Internally tested on secure clusters. Only difference in patches is a conflicting import.

            People

            • Assignee:
              Daryn Sharp
              Reporter:
              Daryn Sharp
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development