Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha, 3.0.0, 0.23.7
    • Fix Version/s: 0.23.7, 2.0.4-alpha
    • Component/s: None
    • Labels:
      None

      Description

      When the webhdfs SPNEGO token expires, the fs doesn't attempt to renegotiate a new SPNEGO token. This renders webhdfs unusable for daemons that are logged in via a keytab which would allow a new SPNEGO token to be generated.

      1. HDFS-4548.patch
        2 kB
        Daryn Sharp
      2. HDFS-4548.branch-23.patch
        0.9 kB
        Daryn Sharp
      3. HDFS-4548.patch
        1 kB
        Daryn Sharp
      4. HDFS-4548.branch-23.patch
        3 kB
        Daryn Sharp
      5. HDFS-4548.branch-23.patch
        1 kB
        Daryn Sharp
      6. HDFS-4548.patch
        1 kB
        Daryn Sharp
      7. HDFS-4548.branch-23.patch
        2 kB
        Daryn Sharp
      8. HDFS-4548.patch
        2 kB
        Daryn Sharp
      9. HDFS-4548.branch-23.patch
        2 kB
        Daryn Sharp
      10. HDFS-4548.patch
        2 kB
        Daryn Sharp

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1391 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1391/)
          HDFS-4548. Webhdfs doesn't renegotiate SPNEGO token. Contributed by Daryn Sharp. (Revision 1464548)

          Result = SUCCESS
          kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464548
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1391 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1391/ ) HDFS-4548 . Webhdfs doesn't renegotiate SPNEGO token. Contributed by Daryn Sharp. (Revision 1464548) Result = SUCCESS kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464548 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1364 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1364/)
          HDFS-4548. Webhdfs doesn't renegotiate SPNEGO token. Contributed by Daryn Sharp. (Revision 1464548)

          Result = FAILURE
          kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464548
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1364 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1364/ ) HDFS-4548 . Webhdfs doesn't renegotiate SPNEGO token. Contributed by Daryn Sharp. (Revision 1464548) Result = FAILURE kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464548 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #573 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/573/)
          HDFS-4548. Webhdfs doesn't renegotiate SPNEGO token. Contributed by Daryn Sharp. (Revision 1464556)

          Result = SUCCESS
          kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464556
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #573 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/573/ ) HDFS-4548 . Webhdfs doesn't renegotiate SPNEGO token. Contributed by Daryn Sharp. (Revision 1464556) Result = SUCCESS kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464556 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #175 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/175/)
          HDFS-4548. Webhdfs doesn't renegotiate SPNEGO token. Contributed by Daryn Sharp. (Revision 1464548)

          Result = SUCCESS
          kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464548
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #175 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/175/ ) HDFS-4548 . Webhdfs doesn't renegotiate SPNEGO token. Contributed by Daryn Sharp. (Revision 1464548) Result = SUCCESS kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464548 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
          Hide
          Kihwal Lee added a comment -

          I've committed to trunk, branch-2 and branch-0.23.
          Thanks all for the work!

          Show
          Kihwal Lee added a comment - I've committed to trunk, branch-2 and branch-0.23. Thanks all for the work!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3563 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3563/)
          HDFS-4548. Webhdfs doesn't renegotiate SPNEGO token. Contributed by Daryn Sharp. (Revision 1464548)

          Result = SUCCESS
          kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464548
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3563 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3563/ ) HDFS-4548 . Webhdfs doesn't renegotiate SPNEGO token. Contributed by Daryn Sharp. (Revision 1464548) Result = SUCCESS kihwal : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1464548 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
          Hide
          Alejandro Abdelnur added a comment -

          Daryn, I'm OK with the fix itself(you have my +1). It just that we had a string of JIRAs 'tweaking' things here and there for DelegationToken and SPNEGO instead taking a step a back a fixing things properly.

          Show
          Alejandro Abdelnur added a comment - Daryn, I'm OK with the fix itself(you have my +1). It just that we had a string of JIRAs 'tweaking' things here and there for DelegationToken and SPNEGO instead taking a step a back a fixing things properly.
          Hide
          Daryn Sharp added a comment -

          [...] not to commit changes that will be undo right the way.

          Regarding this point, all we're debating is the 1 line I moved from renew and cancel token into the open connection. This is to ensure that getting a token is also using a valid TGT instead of implicitly assuming something else refreshed the TGT. In no way did I really change the pre-existing behavior, and it's the same long-standing behavior of hftp. Any change would be an enhancement that shouldn't block this jira.

          More info on why UGI works the way it does: The renewal thread runs for ticket cache TGTs because it must renew before the TGT expires or it's game over - a new TGT can't be acquired without the user's creds. Keytab logins do lazy refresh of TGTs because it can acquire a new TGT with the keytab creds.

          Show
          Daryn Sharp added a comment - [...] not to commit changes that will be undo right the way. Regarding this point, all we're debating is the 1 line I moved from renew and cancel token into the open connection. This is to ensure that getting a token is also using a valid TGT instead of implicitly assuming something else refreshed the TGT. In no way did I really change the pre-existing behavior, and it's the same long-standing behavior of hftp. Any change would be an enhancement that shouldn't block this jira. More info on why UGI works the way it does: The renewal thread runs for ticket cache TGTs because it must renew before the TGT expires or it's game over - a new TGT can't be acquired without the user's creds. Keytab logins do lazy refresh of TGTs because it can acquire a new TGT with the keytab creds.
          Hide
          Daryn Sharp added a comment -

          then the JDK does the renewal for you. that is how hadoop-auth works on the server side.

          Hmm, your experience with hadoop-auth and the JDK automatically renewing TGTs made me doubt myself. I looked at the source for Krb5LoginModule and the renewTGT option is only used inside a conditional for the ticket cache. If enabled, and a TGT is in the ticket cache, it will issue a one time renewal. If it's from a keytab, no renewal is performed. Do you know where it's scheduling future renewals?

          Back to UGI, UGI has a thread that triggers the relogin, why do we need to call it explicitly?

          The UGI renewal thread is only spawned for ticket cache logins, not keytab logins. That's why hftp, webhdfs, and RPC have to check if a keytab user needs to be re-logged in. It's less than ideal, and I'd like to make it better, but it's a tangent to this blocker...

          Show
          Daryn Sharp added a comment - then the JDK does the renewal for you. that is how hadoop-auth works on the server side. Hmm, your experience with hadoop-auth and the JDK automatically renewing TGTs made me doubt myself. I looked at the source for Krb5LoginModule and the renewTGT option is only used inside a conditional for the ticket cache. If enabled, and a TGT is in the ticket cache, it will issue a one time renewal. If it's from a keytab, no renewal is performed. Do you know where it's scheduling future renewals? Back to UGI, UGI has a thread that triggers the relogin, why do we need to call it explicitly? The UGI renewal thread is only spawned for ticket cache logins, not keytab logins. That's why hftp, webhdfs, and RPC have to check if a keytab user needs to be re-logged in. It's less than ideal, and I'd like to make it better, but it's a tangent to this blocker...
          Hide
          Alejandro Abdelnur added a comment -

          If you have the following configuration set:

          • keytab=<FILE>
          • principal=<PRINCIPAL>
          • useKeyTab=true
          • useTicketCache=true
          • renewTGT=true

          then the JDK does the renewal for you. that is how hadoop-auth works on the server side.

          Back to UGI, UGI has a thread that triggers the relogin, why do we need to call it explicitly?

          Show
          Alejandro Abdelnur added a comment - If you have the following configuration set: keytab=<FILE> principal=<PRINCIPAL> useKeyTab=true useTicketCache=true renewTGT=true then the JDK does the renewal for you. that is how hadoop-auth works on the server side. Back to UGI, UGI has a thread that triggers the relogin, why do we need to call it explicitly?
          Hide
          Daryn Sharp added a comment -

          That's not how it works, so I believe hadoop-auth may be working only because something else is quietly doing the relogin...

          The renewTGT option is only applicable when using a ticket cache. It will fail if the ticket cache option is not enabled. The option causes a TGT obtained from the ticket cache during login to be renewed before its stuffed into the Subject. Afterwards, there is no automatic background renewal triggered by this option. You have to relogin via a LoginContext to allow the kerberos login module to do the renewal.

          The UGI has relogin logic for both ticket cache and keytab. Relogin from the ticket cache triggers the renewTGT upon re-login. Relogin from the keytab gets a new TGT. The latter is critical for daemons. RPC automatically issues a relogin for connection errors, so webhdfs just like hftp, must do the relogin themselves.

          I haven't changed the behavior of webhdfs, but rather moved relogin to a common place. The goal here is minimal change to make webhdfs usable beyond 10h. The proposed changes appear predicated on a misunderstanding, so are you ok with this patch?

          (Aside: I already plan to streamline all the relogin methods into a single relogin as part of my stalled, but soon to be resumed, SASL work)

          Show
          Daryn Sharp added a comment - That's not how it works, so I believe hadoop-auth may be working only because something else is quietly doing the relogin... The renewTGT option is only applicable when using a ticket cache. It will fail if the ticket cache option is not enabled. The option causes a TGT obtained from the ticket cache during login to be renewed before its stuffed into the Subject. Afterwards, there is no automatic background renewal triggered by this option. You have to relogin via a LoginContext to allow the kerberos login module to do the renewal. The UGI has relogin logic for both ticket cache and keytab. Relogin from the ticket cache triggers the renewTGT upon re-login. Relogin from the keytab gets a new TGT. The latter is critical for daemons. RPC automatically issues a relogin for connection errors, so webhdfs just like hftp, must do the relogin themselves. I haven't changed the behavior of webhdfs, but rather moved relogin to a common place. The goal here is minimal change to make webhdfs usable beyond 10h. The proposed changes appear predicated on a misunderstanding, so are you ok with this patch? (Aside: I already plan to streamline all the relogin methods into a single relogin as part of my stalled, but soon to be resumed, SASL work)
          Hide
          Alejandro Abdelnur added a comment -

          agree, we can do that in a different JIRA, though I would mark this one as dependent on that new JIRA not to commit changes that will be undo right the way.

          Show
          Alejandro Abdelnur added a comment - agree, we can do that in a different JIRA, though I would mark this one as dependent on that new JIRA not to commit changes that will be undo right the way.
          Hide
          Kihwal Lee added a comment -

          The UGI should, in using keytab, set renewTGT=true. This will make the JDK to renew automatically the ticket.

          There are other places something similar is done: GetImageServlet and SecondaryNameNode call reloginFromKeytab(), not even checkTGTAndReloginFromKeytab(). If what you suggested works, we can fix them too. We could perhaps do that in a separate jira.

          Show
          Kihwal Lee added a comment - The UGI should, in using keytab, set renewTGT=true. This will make the JDK to renew automatically the ticket. There are other places something similar is done: GetImageServlet and SecondaryNameNode call reloginFromKeytab(), not even checkTGTAndReloginFromKeytab(). If what you suggested works, we can fix them too. We could perhaps do that in a separate jira.
          Hide
          Alejandro Abdelnur added a comment -

          Looking a bit more ....

          I think WebHdfsFileSystem should not trigger a relogin from keytab.

          The UGI should, in using keytab, set renewTGT=true. This will make the JDK to renew automatically the ticket. as a reference, hadoop-auth is doing this for the KerberosAuthenticationHandler.

          AFAIK, the logic in UGI to force a relogin is there for kinit-ed sessions, which are be relogin automatically by the JDK.

          So, it seems to me, the fix should be:

          • remove all checkTGTAndReloginFromKeytab() calls from WebHddfsFileSystem
          • make the UGI.checkTGTAndReloginFromKeytab() a NOP (for backwards compat)
          • make UGI to set renewTGT=true for keytab sessions
          Show
          Alejandro Abdelnur added a comment - Looking a bit more .... I think WebHdfsFileSystem should not trigger a relogin from keytab. The UGI should, in using keytab, set renewTGT=true. This will make the JDK to renew automatically the ticket. as a reference, hadoop-auth is doing this for the KerberosAuthenticationHandler. AFAIK, the logic in UGI to force a relogin is there for kinit-ed sessions, which are be relogin automatically by the JDK. So, it seems to me, the fix should be: remove all checkTGTAndReloginFromKeytab() calls from WebHddfsFileSystem make the UGI.checkTGTAndReloginFromKeytab() a NOP (for backwards compat) make UGI to set renewTGT=true for keytab sessions
          Hide
          Kihwal Lee added a comment -

          +1 The fix seems simple and straightforward.

          Show
          Kihwal Lee added a comment - +1 The fix seems simple and straightforward.
          Hide
          Daryn Sharp added a comment -

          This really is a blocker because webhdfs becomes unusable by daemons after 10h.

          Show
          Daryn Sharp added a comment - This really is a blocker because webhdfs becomes unusable by daemons after 10h.
          Hide
          Daryn Sharp added a comment -

          Oops, HADOOP-9357.

          Show
          Daryn Sharp added a comment - Oops, HADOOP-9357 .
          Hide
          Daryn Sharp added a comment -

          The test failure is reporting "HDFS requires URIs with schemes have an authority" which I think is being caused by HDFS-9357.

          Show
          Daryn Sharp added a comment - The test failure is reporting "HDFS requires URIs with schemes have an authority" which I think is being caused by HDFS-9357.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12575763/HDFS-4548.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.fs.TestFcHdfsSymlink

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4162//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4162//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575763/HDFS-4548.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.TestFcHdfsSymlink +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/4162//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4162//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          Same as previous patches, just made from the top level instead of 1 dir deep. Tests aren't feasible due to kerberos being required to activate code paths, but fixes have been verified on internal clusters blocked by this issue.

          Show
          Daryn Sharp added a comment - Same as previous patches, just made from the top level instead of 1 dir deep. Tests aren't feasible due to kerberos being required to activate code paths, but fixes have been verified on internal clusters blocked by this issue.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12573721/HDFS-4548.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4098//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573721/HDFS-4548.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4098//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          Updated patches that will negotiate a SPNEGO token as needed for secure auth connections (token operations).

          Show
          Daryn Sharp added a comment - Updated patches that will negotiate a SPNEGO token as needed for secure auth connections (token operations).
          Hide
          Daryn Sharp added a comment -

          Webhdfs is only using an authenticated url connection for token operations which relatively speaking don't occur very often esp. from a client - once for "hadoop fs" and once for job submission. The simplest solution is to SPNEGO negotiate the token connections.

          This patch will intersect with HDFS-3367 so I am not submitting since I will need to amend it.

          Show
          Daryn Sharp added a comment - Webhdfs is only using an authenticated url connection for token operations which relatively speaking don't occur very often esp. from a client - once for "hadoop fs" and once for job submission. The simplest solution is to SPNEGO negotiate the token connections. This patch will intersect with HDFS-3367 so I am not submitting since I will need to amend it.
          Hide
          Daryn Sharp added a comment -

          Valid points. Currently webhdfs does both spnego and token auth for all non-token operations. Maybe what I should do is make the authenticated url stuff only be used by token operations. All other operations do not require spnego, they just need a token. The token ops don't send data payloads, so the retry-ability of those operations when the spnego token goes bad is not an issue.

          Show
          Daryn Sharp added a comment - Valid points. Currently webhdfs does both spnego and token auth for all non-token operations. Maybe what I should do is make the authenticated url stuff only be used by token operations. All other operations do not require spnego, they just need a token. The token ops don't send data payloads, so the retry-ability of those operations when the spnego token goes bad is not an issue.
          Hide
          Alejandro Abdelnur added a comment -

          I'm not sure I understand the data passthrough issue ...

          (Sorry for the delay answering this)

          What I've meant is that if you have a app reading data from stream and trying to write it to WebHDFS, because of how Java HTTP impl works, the actual connection will not be attempted until some data is written to the HTTP stream. This app will have to replay the written data, it means it has to cache it until is flushed.

          Show
          Alejandro Abdelnur added a comment - I'm not sure I understand the data passthrough issue ... (Sorry for the delay answering this) What I've meant is that if you have a app reading data from stream and trying to write it to WebHDFS, because of how Java HTTP impl works, the actual connection will not be attempted until some data is written to the HTTP stream. This app will have to replay the written data, it means it has to cache it until is flushed.
          Hide
          Daryn Sharp added a comment -

          This patch does the "right wrong thing" based on webhdfs mishandling response codes.

          Show
          Daryn Sharp added a comment - This patch does the "right wrong thing" based on webhdfs mishandling response codes.
          Hide
          Daryn Sharp added a comment -

          Patch for branch 23 to retry at a higher level when an AuthenticationException occurs. This will handle authentication exceptions either during connect or in response to the operations.

          It's not very pretty because AuthenticationException can originate from multiple levels sometimes wrapped in an IOException. I'd rather not change method signatures for 23, but it should be cleaner in trunk since AuthenticationException appears to bubble up higher.

          Since the code is directly tied to kerberos, it's not possible to write tests. I have verified existing webhdfs tests pass, that webhdfs still works on a secure cluster, and that the re-attempt works on a secure cluster by instrumenting the code to throw an exception on the first connect or first validation of a response.

          Trunk patch forthcoming.

          Show
          Daryn Sharp added a comment - Patch for branch 23 to retry at a higher level when an AuthenticationException occurs. This will handle authentication exceptions either during connect or in response to the operations. It's not very pretty because AuthenticationException can originate from multiple levels sometimes wrapped in an IOException . I'd rather not change method signatures for 23, but it should be cleaner in trunk since AuthenticationException appears to bubble up higher. Since the code is directly tied to kerberos, it's not possible to write tests. I have verified existing webhdfs tests pass, that webhdfs still works on a secure cluster, and that the re-attempt works on a secure cluster by instrumenting the code to throw an exception on the first connect or first validation of a response. Trunk patch forthcoming.
          Hide
          Daryn Sharp added a comment -

          BTW, earlier I filed HADOOP-9366 for the mutable hashCode. I think we probably need a hashCode otherwise we can never use the token correctly in a collection.

          I'm not sure I understand the data passthrough issue, would you elaborate? This low level area is just handling the open of connections so higher level callers can send the actual http operation. The retry policies must be able to handle redoing the whole operation (including upload) or they are fundamentally broken?

          I realized these patches are incomplete. They only handle the case of actual negotiation failing. If the token is acquired but expired, operations get a 401 response which I'm not handling...

          Show
          Daryn Sharp added a comment - BTW, earlier I filed HADOOP-9366 for the mutable hashCode. I think we probably need a hashCode otherwise we can never use the token correctly in a collection. I'm not sure I understand the data passthrough issue, would you elaborate? This low level area is just handling the open of connections so higher level callers can send the actual http operation. The retry policies must be able to handle redoing the whole operation (including upload) or they are fundamentally broken? I realized these patches are incomplete. They only handle the case of actual negotiation failing. If the token is acquired but expired, operations get a 401 response which I'm not handling...
          Hide
          Alejandro Abdelnur added a comment -

          hashCode changing is not good. It seem we are using it only in testcases, maybe we should just remove it.

          Why GET? because a a renewal will ignore anything being uploaded (if PUT/POST) thus avoiding having to handle retry for upload operations which may not be easily redoable if you are doing a data passthrough.

          Show
          Alejandro Abdelnur added a comment - hashCode changing is not good. It seem we are using it only in testcases, maybe we should just remove it. Why GET? because a a renewal will ignore anything being uploaded (if PUT/POST) thus avoiding having to handle retry for upload operations which may not be easily redoable if you are doing a data passthrough.
          Hide
          Daryn Sharp added a comment -

          I thought about allowing a token to be reset, but its hashCode is based on the token. Come to think of it, it's a bug that the hash code is 0 if not set, but changes once set.

          The SPNEGO token is used for getting (GET) & renewing (PUT) delegation tokens, so why should we only retry on GET?

          Show
          Daryn Sharp added a comment - I thought about allowing a token to be reset, but its hashCode is based on the token. Come to think of it, it's a bug that the hash code is 0 if not set, but changes once set. The SPNEGO token is used for getting (GET) & renewing (PUT) delegation tokens, so why should we only retry on GET?
          Hide
          Daryn Sharp added a comment -

          Linking to jira regarding NPEs to acquire a SPNEGO token.

          Show
          Daryn Sharp added a comment - Linking to jira regarding NPEs to acquire a SPNEGO token.
          Hide
          Alejandro Abdelnur added a comment -

          Token is an inner class of AuthenticateURL, we could add a private reset() method and the retry logic.

          Regardless where this is done, we should make sure the retry on renegotiation is a GET request.

          Show
          Alejandro Abdelnur added a comment - Token is an inner class of AuthenticateURL, we could add a private reset() method and the retry logic. Regardless where this is done, we should make sure the retry on renegotiation is a GET request.
          Hide
          Daryn Sharp added a comment -

          I generally agree, but given how the code is written I don't think that's easily possible. Webhdfs is managing the token and the token is considered immutable once set, so there isn't a way for AuthenticatedURL to update the token... I'm trying to shoot for the minimal change to make webhdfs usable by daemons. I'm still trying to test this. :/

          Show
          Daryn Sharp added a comment - I generally agree, but given how the code is written I don't think that's easily possible. Webhdfs is managing the token and the token is considered immutable once set, so there isn't a way for AuthenticatedURL to update the token... I'm trying to shoot for the minimal change to make webhdfs usable by daemons. I'm still trying to test this. :/
          Hide
          Alejandro Abdelnur added a comment -

          IMO, the renegotiate logic is more of an AuthenticatedURL#openConnection() thing than a WebHDFS thing, WebHDFS should get it for free.

          Show
          Alejandro Abdelnur added a comment - IMO, the renegotiate logic is more of an AuthenticatedURL#openConnection() thing than a WebHDFS thing, WebHDFS should get it for free.
          Hide
          Daryn Sharp added a comment -

          Patch is a bit smaller, and 23 is a bit different due to lack of formal retry policies.

          Trying to write tests.

          Show
          Daryn Sharp added a comment - Patch is a bit smaller, and 23 is a bit different due to lack of formal retry policies. Trying to write tests.
          Hide
          Daryn Sharp added a comment -

          Basic approach is if an AuthenticationException is encountered while holding a SPNEGO token, blank it out and try again. If the second attempt fails, error out.

          I need to figure out how to write tests for this change, but please provide feedback on whether this is a viable approach.

          Show
          Daryn Sharp added a comment - Basic approach is if an AuthenticationException is encountered while holding a SPNEGO token, blank it out and try again. If the second attempt fails, error out. I need to figure out how to write tests for this change, but please provide feedback on whether this is a viable approach.

            People

            • Assignee:
              Daryn Sharp
              Reporter:
              Daryn Sharp
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development