Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2485

Oozie client keeps trying to use expired auth token

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: trunk
    • Fix Version/s: 4.3.0
    • Component/s: client, security
    • Labels:
      None

      Description

      When using Hadoop 2.4.0 or later, the Oozie client doesn't update the auth token when it expires. The client doesn't typically give you an error because it will still fallback and authenticate via Kerberos or Pseudo. However, this is inefficient.

      This appears to be due to HADOOP-10301, which made an incompatible change with how the AuthHandler tells the Authenticator when a token has expired. It used to give a 401 when the token expired, but now it will do SPNEGO (if you have Kerberos credentials) and return a new token, all in the same call. Oozie client's code doesn't handle that case.

      With Pseudo Auth, it behaves a little differently and you now get a 403 on that first call, but it doesn't give you a new token.

        Issue Links

          Activity

          Hide
          rkanter Robert Kanter added a comment -

          Closing issue; Oozie 4.3.0 is released.

          Show
          rkanter Robert Kanter added a comment - Closing issue; Oozie 4.3.0 is released.
          Hide
          rkanter Robert Kanter added a comment -

          Thanks for the review Rohini. Committed to master!

          Show
          rkanter Robert Kanter added a comment - Thanks for the review Rohini. Committed to master!
          Hide
          rohini Rohini Palaniswamy added a comment -

          +1

          Show
          rohini Rohini Palaniswamy added a comment - +1
          Hide
          rkanter Robert Kanter added a comment -

          Test failures unrelated

          Show
          rkanter Robert Kanter added a comment - Test failures unrelated
          Hide
          hadoopqa Hadoop QA added a comment -

          Testing JIRA OOZIE-2485

          Cleaning local git workspace

          ----------------------------

          +1 PATCH_APPLIES
          +1 CLEAN
          +1 RAW_PATCH_ANALYSIS
          . +1 the patch does not introduce any @author tags
          . +1 the patch does not introduce any tabs
          . +1 the patch does not introduce any trailing spaces
          . +1 the patch does not introduce any line longer than 132
          . +1 the patch does adds/modifies 1 testcase(s)
          +1 RAT
          . +1 the patch does not seem to introduce new RAT warnings
          +1 JAVADOC
          . +1 the patch does not seem to introduce new Javadoc warnings
          +1 COMPILE
          . +1 HEAD compiles
          . +1 patch compiles
          . +1 the patch does not seem to introduce new javac warnings
          +1 BACKWARDS_COMPATIBILITY
          . +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
          . +1 the patch does not modify JPA files
          -1 TESTS
          . Tests run: 1768
          . Tests failed: 6
          . Tests errors: 2

          . The patch failed the following testcases:

          . testActionCheckTransientDuringLauncher(org.apache.oozie.command.wf.TestActionCheckXCommand)
          . testMain(org.apache.oozie.action.hadoop.TestHiveMain)
          . testPigScript(org.apache.oozie.action.hadoop.TestPigMain)
          . testPig_withNullExternalID(org.apache.oozie.action.hadoop.TestPigMain)
          . testEmbeddedPigWithinPython(org.apache.oozie.action.hadoop.TestPigMain)
          . testPigScript(org.apache.oozie.action.hadoop.TestPigMainWithOldAPI)

          +1 DISTRO
          . +1 distro tarball builds with the patch

          ----------------------------
          -1 Overall result, please check the reported -1(s)

          The full output of the test-patch run is available at

          . https://builds.apache.org/job/oozie-trunk-precommit-build/2793/

          Show
          hadoopqa Hadoop QA added a comment - Testing JIRA OOZIE-2485 Cleaning local git workspace ---------------------------- +1 PATCH_APPLIES +1 CLEAN +1 RAW_PATCH_ANALYSIS . +1 the patch does not introduce any @author tags . +1 the patch does not introduce any tabs . +1 the patch does not introduce any trailing spaces . +1 the patch does not introduce any line longer than 132 . +1 the patch does adds/modifies 1 testcase(s) +1 RAT . +1 the patch does not seem to introduce new RAT warnings +1 JAVADOC . +1 the patch does not seem to introduce new Javadoc warnings +1 COMPILE . +1 HEAD compiles . +1 patch compiles . +1 the patch does not seem to introduce new javac warnings +1 BACKWARDS_COMPATIBILITY . +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations . +1 the patch does not modify JPA files -1 TESTS . Tests run: 1768 . Tests failed: 6 . Tests errors: 2 . The patch failed the following testcases: . testActionCheckTransientDuringLauncher(org.apache.oozie.command.wf.TestActionCheckXCommand) . testMain(org.apache.oozie.action.hadoop.TestHiveMain) . testPigScript(org.apache.oozie.action.hadoop.TestPigMain) . testPig_withNullExternalID(org.apache.oozie.action.hadoop.TestPigMain) . testEmbeddedPigWithinPython(org.apache.oozie.action.hadoop.TestPigMain) . testPigScript(org.apache.oozie.action.hadoop.TestPigMainWithOldAPI) +1 DISTRO . +1 distro tarball builds with the patch ---------------------------- -1 Overall result, please check the reported -1(s) The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/2793/
          Hide
          rkanter Robert Kanter added a comment -

          Before I explain what the patch does, here’s what the code is currently doing (which was’t super-obvious to me at first):

          1. Read in the cached token, if enabled
          2. If we have a token, make a connection to the server to test if the token is still valid (i.e. hasn’t expired or been tampered with)
            • If it’s not valid, then let’s delete it
          3. If we don’t have a token (or it was deleted in the previous step), get a new one from the Server using Kerberos or Pseudo auth (which uses a new connection under the covers)
          4. If we now have a new token, write it to the cache file (if enabled)
          5. Create a new connection using the token and return it up to the caller, which may further modify it and then use it

          As you can see, we may make 2-3 connections per REST call. This is partly due to how the code is setup where the auth client code passes the connection to other code, which doesn’t know about auth. While that could be changed, it would require a lot of refactoring and changing how the OozieClient and AuthOozieClient are setup.

          The main new thing is the else statement with the big comment. Most of the other stuff is some minor refactoring, some helpful comments, and an optimization that also prevents a rare race condition. Here’s a summary of what I changed:

          • We can save a call to the server by parsing and checking the expiration time of the token locally. If it’s expired, we can just delete it. I also have it consider -5 minutes as expired to prevent a race condition (though I haven’t seen this ever occur) where an almost expiring token would pass the server check but fail by the time the actual command is made. Given that this should happen quickly, I haven’t seen this problem before, but 5 min seems like more than a safe buffer.
          • One of the changes made by HADOOP-10301 is that PseudoAuthenticationHandler will now return a 403 instead of a 401 when the token is invalid. The patch now checks for both 401 and 403 to handle either code
          • The other change made by HADOOP-10301 is that KerberosAuthenticationHandler will now return a 200 instead of a 401 when the token is invalid, plus, it will give you a new token. The patch now tries to extract the token, if it’s there, in this case. I added a big comment explaining this.
          • I replaced
            new AuthenticatedURL(authenticator).openConnection(url, currentToken);
            

            with

            authenticator.authenticate(url, currentToken);
            

            because it’s more clear and saves creating an unused connection. If you look at the AuthenticatedURL code, it just calls authenticate and then creates a new connection that we’re not using here.

          I think checking the expiration time locally should only be a problem if the local clock is way off (in which case other things like Kerberos won’t work anyway). If the clock is wrong, either it won’t delete the token in which case the check to the server will take care of it, or it will delete the token early in which case we’ll just get a new one. So I think this should be fine, but if others don’t think we should do this, I’m fine with removing it.

          I wasn't able to get MiniKDC working. Having the test working with Kerberos would be good though, but I don't think it should block getting this fix in. Once this is in, I'll create another JIRA and upload what I have so far there. Maybe someone else can figure it out.

          Show
          rkanter Robert Kanter added a comment - Before I explain what the patch does, here’s what the code is currently doing (which was’t super-obvious to me at first): Read in the cached token, if enabled If we have a token, make a connection to the server to test if the token is still valid (i.e. hasn’t expired or been tampered with) If it’s not valid, then let’s delete it If we don’t have a token (or it was deleted in the previous step), get a new one from the Server using Kerberos or Pseudo auth (which uses a new connection under the covers) If we now have a new token, write it to the cache file (if enabled) Create a new connection using the token and return it up to the caller, which may further modify it and then use it As you can see, we may make 2-3 connections per REST call. This is partly due to how the code is setup where the auth client code passes the connection to other code, which doesn’t know about auth. While that could be changed, it would require a lot of refactoring and changing how the OozieClient and AuthOozieClient are setup. The main new thing is the else statement with the big comment. Most of the other stuff is some minor refactoring, some helpful comments, and an optimization that also prevents a rare race condition. Here’s a summary of what I changed: We can save a call to the server by parsing and checking the expiration time of the token locally. If it’s expired, we can just delete it. I also have it consider -5 minutes as expired to prevent a race condition (though I haven’t seen this ever occur) where an almost expiring token would pass the server check but fail by the time the actual command is made. Given that this should happen quickly, I haven’t seen this problem before, but 5 min seems like more than a safe buffer. One of the changes made by HADOOP-10301 is that PseudoAuthenticationHandler will now return a 403 instead of a 401 when the token is invalid. The patch now checks for both 401 and 403 to handle either code The other change made by HADOOP-10301 is that KerberosAuthenticationHandler will now return a 200 instead of a 401 when the token is invalid, plus, it will give you a new token. The patch now tries to extract the token, if it’s there, in this case. I added a big comment explaining this. I replaced new AuthenticatedURL(authenticator).openConnection(url, currentToken); with authenticator.authenticate(url, currentToken); because it’s more clear and saves creating an unused connection. If you look at the AuthenticatedURL code, it just calls authenticate and then creates a new connection that we’re not using here. I think checking the expiration time locally should only be a problem if the local clock is way off (in which case other things like Kerberos won’t work anyway). If the clock is wrong, either it won’t delete the token in which case the check to the server will take care of it, or it will delete the token early in which case we’ll just get a new one. So I think this should be fine, but if others don’t think we should do this, I’m fine with removing it. I wasn't able to get MiniKDC working. Having the test working with Kerberos would be good though, but I don't think it should block getting this fix in. Once this is in, I'll create another JIRA and upload what I have so far there. Maybe someone else can figure it out.
          Hide
          rkanter Robert Kanter added a comment -

          I spent a ton of time looking into this. Unfortunately, it's very tricky because different versions of hadoop-auth do slightly different things. I have a fix that I believe works for everything. I'm currently trying to get MiniKDC working so I can have unit tests that use Kerberos for this. I'll try to post a patch by the end of the week; if I can't get MiniKDC working by then, I'll give up on it.

          Show
          rkanter Robert Kanter added a comment - I spent a ton of time looking into this. Unfortunately, it's very tricky because different versions of hadoop-auth do slightly different things. I have a fix that I believe works for everything. I'm currently trying to get MiniKDC working so I can have unit tests that use Kerberos for this. I'll try to post a patch by the end of the week; if I can't get MiniKDC working by then, I'll give up on it.

            People

            • Assignee:
              rkanter Robert Kanter
              Reporter:
              rkanter Robert Kanter
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development