Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-10959

A Kerberos based token authentication approach

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: security
    • Labels:

      Description

      To implement and integrate pluggable authentication providers, enhance desirable single sign on for end users, and help enforce centralized access control on the platform, the community has widely discussed and concluded token based authentication could be the appropriate approach. TokenAuth (HADOOP-9392) was proposed and is under development to implement another Authentication Method in lieu with Simple and Kerberos. It is a big and long term effort to support TokenAuth across the entire ecosystem. We here propose a short term replacement based on Kerberos that can complement to TokenAuth. Our solution involves less codes changes with limited risk and the main development work has already been done in our POC. Users can use our solution as a short term solution to support token inside Hadoop.

      This effort and resultant solution will be fully described in the design document to be attached. And the brief introduction will be commented.

        Issue Links

          Activity

          Hide
          drankye Kai Zheng added a comment -

          We're working on Apache Kerby and the new mechanism TokenPreauth will be available in a month.

          Show
          drankye Kai Zheng added a comment - We're working on Apache Kerby and the new mechanism TokenPreauth will be available in a month.
          Hide
          drankye Kai Zheng added a comment -

          Status update.

          Haox was accepted by ApacheDS and Apache Kerby was launched. We're working on it and implementing the tokenPreauth mechanism in Kerby first. With the major work done there, we'll be back here soon to realize the token support for Hadoop based on Kerberos by leveraging Kerby.

          Show
          drankye Kai Zheng added a comment - Status update. Haox was accepted by ApacheDS and Apache Kerby was launched. We're working on it and implementing the tokenPreauth mechanism in Kerby first. With the major work done there, we'll be back here soon to realize the token support for Hadoop based on Kerberos by leveraging Kerby.
          Hide
          drankye Kai Zheng added a comment -

          Just some update.

          We're working on defining the token-preauth and access-token-profile drafts with MIT Kerberos team. As it's of low priority it's running very slow.

          Meanwhile, we have also initiated Haox project, targeting a Java Kerberos implementation and based on it, we're going to prototype the Kerberos extensions in not so long future.

          https://github.com/drankye/haox

          Show
          drankye Kai Zheng added a comment - Just some update. We're working on defining the token-preauth and access-token-profile drafts with MIT Kerberos team. As it's of low priority it's running very slow. Meanwhile, we have also initiated Haox project, targeting a Java Kerberos implementation and based on it, we're going to prototype the Kerberos extensions in not so long future. https://github.com/drankye/haox
          Hide
          drankye Kai Zheng added a comment -

          Undermine as in bypassing a password/keytab means kerberos is no longer the source of truth for passwords.

          In Kerberos password for end user has its own weakness and inconvenience, that's why today's Kerberos is evolving to support other credentials or approaches to prove principal's identity, PKINIT (x509 certificate), OTP (One Time Password), S4U (Protocol transition), and also this newly proposed token-preauth (JWT token). All these extensions were/are not undermining as you might mean. MIT Kerberos has a strong wish to support JWT token, which could open the door to support OAuth 2.0 widely used in internet and cloud for existing Kerberized distributed systems (like Hadoop). Lacking of such extensions will result in two options in some deployments, either giving up Kerberos at all, or trying to bypass its inconvenience (like in Hadoop you can find some examples, bypass SPNEGO in browser access and use delegation token externally, rather like hack right).

          In such extensions listed above, other identity systems are trusted and credentials from the sources are employed to acquire TGT so that users can continue to access Kerberized systems without requiring such systems to be modified to support the identity systems or credentials directly. Today's Kerberos including AD and MIT all go in this way and I would think Hadoop platform should also follow the way if possible in future.

          Will jwt support revoking a kvno?

          In token-preauth, JWT token doesn't have to correspond to Kerberos key(kvno). The user using JWT token for login can have no key at all.

          After a referral is issued in a cross-realm setup, will the KDC continue to trust the origin KDC

          How preauth mechanism handles cross-realm depends on the mechanism itself. token-preauth works for the set of users in its local KDC, and may not support cross-realm for AD, though the KDC supports cross-realm to AD, but for another set of users.

          How will the jwt be injected in the TGT/TGS?

          KDC (AS/TGS) decrypts, verifies signature of and validates JWT token according to JWT token specs, with the trusting setup configuration in MIT KDC for the trusted token authority.

          If the user kdc (usually AD) doesn't have the jwt plug-in, how does the jwt get injected into tickets after the referral to the (possibly MIT) cluster kdc?

          As told above, token-preauth itself may not support cross-realm to AD for its early stage since AD client may not support to input token at all. In such case, tickets from users from AD won't get token included, but it doesn't hurt for Hadoop. In our prototype, if tickets contain token, then respect it and its attributes. But if don't, fallback to original providers.

          it's a terrible idea. I was just seeking clarification

          Well I agree, but not absolutely. Just by the way (out of the scope), how YARN would support Kerberized applications written before Hadoop (and the delegation token stuff) transparently if ever necessary? I'm asking this because YARN claims to support that, not to say long time running services.

          I'm not certain how I feel about the service not being its own source of authority for groups...

          Well, if you deploy TokenAuth with this solution or whatever one, you surely trust the token authority you set up for your cluster, and of course trust the tokens from the authority, right. But if you don't want to deploy, it's easy and by default, nothing hurts I'm sure. All goes to Kerberos itself as usually.

          Storm already does this. The level of security is arguable, but it routinely pushes new forwardable TGTs to the topology.

          Hmmm, I'm not sure. Would you clarify in details. How the principal and keytab get prepared for a service to run in a container on an unknown node during scheduling phase by Resource Manager without kadmin's help? I'm not sure forwardable TGT can serve for that purpose. Yes a task/app can use forwardable TGT to authenticate itself to KDC, but not my question (for service).

          Show
          drankye Kai Zheng added a comment - Undermine as in bypassing a password/keytab means kerberos is no longer the source of truth for passwords. In Kerberos password for end user has its own weakness and inconvenience, that's why today's Kerberos is evolving to support other credentials or approaches to prove principal's identity, PKINIT (x509 certificate), OTP (One Time Password), S4U (Protocol transition), and also this newly proposed token-preauth (JWT token). All these extensions were/are not undermining as you might mean. MIT Kerberos has a strong wish to support JWT token, which could open the door to support OAuth 2.0 widely used in internet and cloud for existing Kerberized distributed systems (like Hadoop). Lacking of such extensions will result in two options in some deployments, either giving up Kerberos at all, or trying to bypass its inconvenience (like in Hadoop you can find some examples, bypass SPNEGO in browser access and use delegation token externally, rather like hack right). In such extensions listed above, other identity systems are trusted and credentials from the sources are employed to acquire TGT so that users can continue to access Kerberized systems without requiring such systems to be modified to support the identity systems or credentials directly. Today's Kerberos including AD and MIT all go in this way and I would think Hadoop platform should also follow the way if possible in future. Will jwt support revoking a kvno? In token-preauth, JWT token doesn't have to correspond to Kerberos key(kvno). The user using JWT token for login can have no key at all. After a referral is issued in a cross-realm setup, will the KDC continue to trust the origin KDC How preauth mechanism handles cross-realm depends on the mechanism itself. token-preauth works for the set of users in its local KDC, and may not support cross-realm for AD, though the KDC supports cross-realm to AD, but for another set of users. How will the jwt be injected in the TGT/TGS? KDC (AS/TGS) decrypts, verifies signature of and validates JWT token according to JWT token specs, with the trusting setup configuration in MIT KDC for the trusted token authority. If the user kdc (usually AD) doesn't have the jwt plug-in, how does the jwt get injected into tickets after the referral to the (possibly MIT) cluster kdc? As told above, token-preauth itself may not support cross-realm to AD for its early stage since AD client may not support to input token at all. In such case, tickets from users from AD won't get token included, but it doesn't hurt for Hadoop. In our prototype, if tickets contain token, then respect it and its attributes. But if don't, fallback to original providers. it's a terrible idea. I was just seeking clarification Well I agree, but not absolutely. Just by the way (out of the scope), how YARN would support Kerberized applications written before Hadoop (and the delegation token stuff) transparently if ever necessary? I'm asking this because YARN claims to support that, not to say long time running services. I'm not certain how I feel about the service not being its own source of authority for groups... Well, if you deploy TokenAuth with this solution or whatever one, you surely trust the token authority you set up for your cluster, and of course trust the tokens from the authority, right. But if you don't want to deploy, it's easy and by default, nothing hurts I'm sure. All goes to Kerberos itself as usually. Storm already does this. The level of security is arguable, but it routinely pushes new forwardable TGTs to the topology. Hmmm, I'm not sure. Would you clarify in details. How the principal and keytab get prepared for a service to run in a container on an unknown node during scheduling phase by Resource Manager without kadmin's help? I'm not sure forwardable TGT can serve for that purpose. Yes a task/app can use forwardable TGT to authenticate itself to KDC, but not my question (for service).
          Hide
          daryn Daryn Sharp added a comment -

          token-preauth conforms to all these and won't undermine Kerberos.

          Undermine as in bypassing a password/keytab means kerberos is no longer the source of truth for passwords. Will jwt support revoking a kvno? After a referral is issued in a cross-realm setup, will the KDC continue to trust the origin KDC or will it revalidate the jwt kvno?

          [...] So a MIT KDC can trust an AD for a set of users and a token authority for another set of users at the same time in a deployment, and the token-preauth plugin is only needed by the MIT KDC.

          How will the jwt be injected in the TGT/TGS? If the user kdc (usually AD) doesn't have the jwt plug-in, how does the jwt get injected into tickets after the referral to the (possibly MIT) cluster kdc?

          I think delegation token works well internally to bypass some Kerberos constraint.

          are they using the JWT tokens to obtain a TGT/TGS in the tasks? I think the latter?

          Great idea. Should this go so far I don't know.

          No, it's a terrible idea. I was just seeking clarification. Tens of thousands of tasks per cluster bombarding kdcs isn't scalable. Delegation tokens were in part designed to avoid excessive kdc load and latency, and to shield the job from service/network interruptions to the kdc.

          Yes token-preauth (with the new AD-TOKEN) is to be standardized. In this extension, JWT token MAY(not REQUIRE) contain groups and also other useful attributes. If it does, then such attribute(s) can be extracted and employed for authorization.

          If delegation tokens continue to be used, this implies that groups will need to propagate the groups from the jwt into the delegation token. I'm not certain how I feel about the service not being its own source of authority for groups...

          think about how to prepare for the long time running service principals and keytabs to be scheduled to run in dynamic containers?

          Storm already does this. The level of security is arguable, but it routinely pushes new forwardable TGTs to the topology.

          Show
          daryn Daryn Sharp added a comment - token-preauth conforms to all these and won't undermine Kerberos. Undermine as in bypassing a password/keytab means kerberos is no longer the source of truth for passwords. Will jwt support revoking a kvno? After a referral is issued in a cross-realm setup, will the KDC continue to trust the origin KDC or will it revalidate the jwt kvno? [...] So a MIT KDC can trust an AD for a set of users and a token authority for another set of users at the same time in a deployment, and the token-preauth plugin is only needed by the MIT KDC. How will the jwt be injected in the TGT/TGS? If the user kdc (usually AD) doesn't have the jwt plug-in, how does the jwt get injected into tickets after the referral to the (possibly MIT) cluster kdc? I think delegation token works well internally to bypass some Kerberos constraint. are they using the JWT tokens to obtain a TGT/TGS in the tasks? I think the latter? Great idea. Should this go so far I don't know. No, it's a terrible idea. I was just seeking clarification. Tens of thousands of tasks per cluster bombarding kdcs isn't scalable. Delegation tokens were in part designed to avoid excessive kdc load and latency, and to shield the job from service/network interruptions to the kdc. Yes token-preauth (with the new AD-TOKEN) is to be standardized. In this extension, JWT token MAY(not REQUIRE) contain groups and also other useful attributes. If it does, then such attribute(s) can be extracted and employed for authorization. If delegation tokens continue to be used, this implies that groups will need to propagate the groups from the jwt into the delegation token. I'm not certain how I feel about the service not being its own source of authority for groups... think about how to prepare for the long time running service principals and keytabs to be scheduled to run in dynamic containers? Storm already does this. The level of security is arguable, but it routinely pushes new forwardable TGTs to the topology.
          Hide
          drankye Kai Zheng added a comment -

          I'm not sure why generating principals and keytabs seems to be viewed as a difficult activity.

          Yes generating principals and keytabs shouldn't be difficult in normal form, but would be difficult in a dynamic scheduled environment like YARN. We should also consider the distribution of the keytabs. Regarding this, perhaps developers from Slider project have the real feeling.

          Most if not nearly all large corporations already use kerberos in some form

          Quite agree, so this solution bases on Kerberos to support the token mechanism. We're not coming up something in line with Kerberos or the Kerberos Authentication Method in Hadoop, but enhance and extend them compatibly.

          My impression it's a misnomer to call this a pre-auth extension? It's not a hardening like the encrypted timestamp pre-auth, but instead is a complete substitute for a password/keytab. Doesn't this completely undermine kerberos?

          Well, let me update verbosely. Classically in RFC4120, Kerberos defined a preauth phase in KDC exchange to alleviate password attack, your mentioned timestamp preauth is right here. Note by employing the preauth, it also allowed to extend the Kerberos protocol and prove the identity of a principal otherwise than just password. In fact PKINIT(RFC4556) was defined to prove principal identity with x509 certificate. Then, FAST(RFC6113) defines a a generalized framework for better defining a preauth mechanism, and in this framework OTP(RFC6560) was defined, and token-preauth is being defined. OTP preauth allows to use a one time password, and token-preauth use a JWT token, to prove the identity instead of password. For end users (not service principal), in fact password or key won't be a must and can be purged at all. Note both AD and MIT Kerberos implemented FAST and PKINIT, so don't worry about it. token-preauth conforms to all these and won't undermine Kerberos.

          Will cross-realm authentication work? Do all KDCs in the trust have to use the custom plugin?

          Cross-realm trust works in Kerberos itself, and in practice MIT KDC trusting AD is deployed well. Like trusting another KDC, a KDC can trust a token issuer/authority. All the trusts can work together or by itself, but not get mixed together. So a MIT KDC can trust an AD for a set of users and a token authority for another set of users at the same time in a deployment, and the token-preauth plugin is only needed by the MIT KDC.

          how do tasks running in the cluster authenticate under the model? Do they continue to use the existing delegation tokens obtained via JWT/TGT during job submission

          I would think yes. The JWT token support is mainly for the initial authentication before accessing to Hadoop. I think delegation token works well internally to bypass some Kerberos constraint. Before we could push Kerberos to evolve to resolve such limits, nothing we can do for now.

          are they using the JWT tokens to obtain a TGT/TGS in the tasks? I think the latter?

          Great idea. Should this go so far I don't know. Maybe in the long future Hadoop can achieve to that, all tokens are unified into one (a standardized JWT token that works both internally and externally), but not for now I guess.

          Mention is made of an AD-TOKEN which I believe is a non-standard MS extension? Do you envision the JWT issuer containing the group mapping for all services?

          Yes token-preauth (with the new AD-TOKEN) is to be standardized. In this extension, JWT token MAY(not REQUIRE) contain groups and also other useful attributes. If it does, then such attribute(s) can be extracted and employed for authorization. If otherwise, no hurt at all and groups mapping can still fall back to original providers.

          something like certificates or private keys or shared secret to be involved - which would be an equal but different "pain to deploy"?

          Yes certificates or tokens may be involved, but can be done with less (not just different) pain. In YARN/Slider totally dynamically scheduled environment, think about how to prepare for the long time running service principals and keytabs to be scheduled to run in dynamic containers?

          Show
          drankye Kai Zheng added a comment - I'm not sure why generating principals and keytabs seems to be viewed as a difficult activity. Yes generating principals and keytabs shouldn't be difficult in normal form, but would be difficult in a dynamic scheduled environment like YARN. We should also consider the distribution of the keytabs. Regarding this, perhaps developers from Slider project have the real feeling. Most if not nearly all large corporations already use kerberos in some form Quite agree, so this solution bases on Kerberos to support the token mechanism. We're not coming up something in line with Kerberos or the Kerberos Authentication Method in Hadoop, but enhance and extend them compatibly. My impression it's a misnomer to call this a pre-auth extension? It's not a hardening like the encrypted timestamp pre-auth, but instead is a complete substitute for a password/keytab. Doesn't this completely undermine kerberos? Well, let me update verbosely. Classically in RFC4120, Kerberos defined a preauth phase in KDC exchange to alleviate password attack, your mentioned timestamp preauth is right here. Note by employing the preauth, it also allowed to extend the Kerberos protocol and prove the identity of a principal otherwise than just password. In fact PKINIT(RFC4556) was defined to prove principal identity with x509 certificate. Then, FAST(RFC6113) defines a a generalized framework for better defining a preauth mechanism, and in this framework OTP(RFC6560) was defined, and token-preauth is being defined. OTP preauth allows to use a one time password, and token-preauth use a JWT token, to prove the identity instead of password. For end users (not service principal), in fact password or key won't be a must and can be purged at all. Note both AD and MIT Kerberos implemented FAST and PKINIT, so don't worry about it. token-preauth conforms to all these and won't undermine Kerberos. Will cross-realm authentication work? Do all KDCs in the trust have to use the custom plugin? Cross-realm trust works in Kerberos itself, and in practice MIT KDC trusting AD is deployed well. Like trusting another KDC, a KDC can trust a token issuer/authority. All the trusts can work together or by itself, but not get mixed together. So a MIT KDC can trust an AD for a set of users and a token authority for another set of users at the same time in a deployment, and the token-preauth plugin is only needed by the MIT KDC. how do tasks running in the cluster authenticate under the model? Do they continue to use the existing delegation tokens obtained via JWT/TGT during job submission I would think yes. The JWT token support is mainly for the initial authentication before accessing to Hadoop. I think delegation token works well internally to bypass some Kerberos constraint. Before we could push Kerberos to evolve to resolve such limits, nothing we can do for now. are they using the JWT tokens to obtain a TGT/TGS in the tasks? I think the latter? Great idea. Should this go so far I don't know. Maybe in the long future Hadoop can achieve to that, all tokens are unified into one (a standardized JWT token that works both internally and externally), but not for now I guess. Mention is made of an AD-TOKEN which I believe is a non-standard MS extension? Do you envision the JWT issuer containing the group mapping for all services? Yes token-preauth (with the new AD-TOKEN) is to be standardized. In this extension, JWT token MAY(not REQUIRE) contain groups and also other useful attributes. If it does, then such attribute(s) can be extracted and employed for authorization. If otherwise, no hurt at all and groups mapping can still fall back to original providers. something like certificates or private keys or shared secret to be involved - which would be an equal but different "pain to deploy"? Yes certificates or tokens may be involved, but can be done with less (not just different) pain. In YARN/Slider totally dynamically scheduled environment, think about how to prepare for the long time running service principals and keytabs to be scheduled to run in dynamic containers?
          Hide
          daryn Daryn Sharp added a comment -

          I digress, but I'm not sure why generating principals and keytabs seems to be viewed as a difficult activity. Most if not nearly all large corporations already use kerberos in some form. Here's a few initial thoughts/questions:

          My impression it's a misnomer to call this a pre-auth extension? It's not a hardening like the encrypted timestamp pre-auth, but instead is a complete substitute for a password/keytab. Doesn't this completely undermine kerberos?

          Will cross-realm authentication work? Do all KDCs in the trust have to use the custom plugin?

          I probably overlooked it, but how do tasks running in the cluster authenticate under the model? Do they continue to use the existing delegation tokens obtained via JWT/TGT during job submission, or are they using the JWT tokens to obtain a TGT/TGS in the tasks? I think the latter?

          The doc says having the NN configured for user to group mappings is not ideal. Mention is made of an AD-TOKEN which I believe is a non-standard MS extension? Do you envision the JWT issuer containing the group mapping for all services?

          However, the pain to deploy keytabs for services can be alleviated by token support, still, another story.

          I'm not sure how you remove the "pain" of copying a file. For mutual auth, I would expect something like certificates or private keys or shared secret to be involved - which would be an equal but different "pain to deploy"?

          Show
          daryn Daryn Sharp added a comment - I digress, but I'm not sure why generating principals and keytabs seems to be viewed as a difficult activity. Most if not nearly all large corporations already use kerberos in some form. Here's a few initial thoughts/questions: My impression it's a misnomer to call this a pre-auth extension? It's not a hardening like the encrypted timestamp pre-auth, but instead is a complete substitute for a password/keytab. Doesn't this completely undermine kerberos? Will cross-realm authentication work? Do all KDCs in the trust have to use the custom plugin? I probably overlooked it, but how do tasks running in the cluster authenticate under the model? Do they continue to use the existing delegation tokens obtained via JWT/TGT during job submission, or are they using the JWT tokens to obtain a TGT/TGS in the tasks? I think the latter? The doc says having the NN configured for user to group mappings is not ideal. Mention is made of an AD-TOKEN which I believe is a non-standard MS extension? Do you envision the JWT issuer containing the group mapping for all services? However, the pain to deploy keytabs for services can be alleviated by token support, still, another story. I'm not sure how you remove the "pain" of copying a file. For mutual auth, I would expect something like certificates or private keys or shared secret to be involved - which would be an equal but different "pain to deploy"?
          Hide
          drankye Kai Zheng added a comment -

          we need to discuss is exactly who has the problem that this solution solves.

          I quite agree. This desires to enhance Hadoop Kerberos authentication by token-preauth mechanism for Kerberos itself and allow to integrate other authentication providers for clusters that require Kerberos as a must essentially or have already deployed Kerberos previously. Do such scenarios make sense? I'd love to discuss and clarify this further with more feedback.

          I think that it is very interesting that this may end up making its way into MIT kerberos itself.

          We're collaborating with MIT team on drafting the token-preauth mechanism and then implementing it based on the prototype. Hopefully we can make it in not so long future but before that we can public the plugin implementation codes for review and binary for experimental usage.

          Not sure how likely it would make it into AD though - so this will end up being a feature that requires MIT kerberos even in MS shops.

          A cluster can have a MIT Kerberos deployment with this token support serving as an authentication hub with internal usage, then AD can be supported by Kerberos cross-realm trusting setup and also other authentication providers can be supported by a token authentication service that supports JWT token. Owning to this, OAuth 2.0 token work flow would be possible for the ecosystem.

          we look at the pains of the current authentication with kerberos approach which ones are actually solved by this solution

          No. This effort doesn't attempt to resolve all the pains of Kerberos, as TokenAuth (HADOOP-9392) desires to. This focuses on providing the token support assuming Kerberos deployment. That means, if you accept Kerberos and like its both strengths and drawbacks for your cluster, then this solution provides you more integration options by employing the token support for your end users' sake.

          Right we do wish and also are making effort to simplify the Kerberos deployment for Hadoop, which we would think it makes sense for the long term. It's another story though.

          keytabs - not really - replaced by JWT tokens (assuming that this is intended for services as well as users)

          It's not a problem to use token to authenticate service, but it doesn't help for the service to authenticate clients because that requires Kerberos keys which must be provided by keytabs. However, the pain to deploy keytabs for services can be alleviated by token support, still, another story.

          SPNEGO - NO - still required for REST APIs and browsers

          It's not true for browsers. Browsers can be input with token by flow (like OAuth web work flow) or user form, and submit the token to server side. In server side it does SPNEGO for compatibility with non-token accesses.

          Can multiple kerberos plugins be used at once - which would allow for a mixed deployment of kerberos and JWT?

          Right. Kerberos support multiple preauthentication mechanisms and MIT KDC supports multiple plugins. You reminded me that I can provide a typical deployment with this token support. Will update the design doc later. Thanks.

          Show
          drankye Kai Zheng added a comment - we need to discuss is exactly who has the problem that this solution solves. I quite agree. This desires to enhance Hadoop Kerberos authentication by token-preauth mechanism for Kerberos itself and allow to integrate other authentication providers for clusters that require Kerberos as a must essentially or have already deployed Kerberos previously. Do such scenarios make sense? I'd love to discuss and clarify this further with more feedback. I think that it is very interesting that this may end up making its way into MIT kerberos itself. We're collaborating with MIT team on drafting the token-preauth mechanism and then implementing it based on the prototype. Hopefully we can make it in not so long future but before that we can public the plugin implementation codes for review and binary for experimental usage. Not sure how likely it would make it into AD though - so this will end up being a feature that requires MIT kerberos even in MS shops. A cluster can have a MIT Kerberos deployment with this token support serving as an authentication hub with internal usage, then AD can be supported by Kerberos cross-realm trusting setup and also other authentication providers can be supported by a token authentication service that supports JWT token. Owning to this, OAuth 2.0 token work flow would be possible for the ecosystem. we look at the pains of the current authentication with kerberos approach which ones are actually solved by this solution No. This effort doesn't attempt to resolve all the pains of Kerberos, as TokenAuth ( HADOOP-9392 ) desires to. This focuses on providing the token support assuming Kerberos deployment. That means, if you accept Kerberos and like its both strengths and drawbacks for your cluster, then this solution provides you more integration options by employing the token support for your end users' sake. Right we do wish and also are making effort to simplify the Kerberos deployment for Hadoop, which we would think it makes sense for the long term. It's another story though. keytabs - not really - replaced by JWT tokens (assuming that this is intended for services as well as users) It's not a problem to use token to authenticate service, but it doesn't help for the service to authenticate clients because that requires Kerberos keys which must be provided by keytabs. However, the pain to deploy keytabs for services can be alleviated by token support, still, another story. SPNEGO - NO - still required for REST APIs and browsers It's not true for browsers. Browsers can be input with token by flow (like OAuth web work flow) or user form, and submit the token to server side. In server side it does SPNEGO for compatibility with non-token accesses. Can multiple kerberos plugins be used at once - which would allow for a mixed deployment of kerberos and JWT? Right. Kerberos support multiple preauthentication mechanisms and MIT KDC supports multiple plugins. You reminded me that I can provide a typical deployment with this token support. Will update the design doc later. Thanks.
          Hide
          lmccay Larry McCay added a comment -

          There is some interesting work here.

          What I need to think about or we need to discuss is exactly who has the problem that this solution solves.

          I think that it is very interesting that this may end up making its way into MIT kerberos itself.
          Not sure how likely it would make it into AD though - so this will end up being a feature that requires MIT kerberos even in MS shops.

          So - if we look at the pains of the current authentication with kerberos approach which ones are actually solved by this solution:

          • Kerberos/KDC setup - NO - in fact it is more complicated (maybe tooling can help)
          • user accounts - NO - still needed
          • keytabs - not really - replaced by JWT tokens (assuming that this is intended for services as well as users)
          • kinit - NO - still required but will present JWT instead of username/token
          • SPNEGO - NO - still required for REST APIs and browsers
          • narrow integration opportunities - YES - there are number of solutions that can issue or exchange other tokens for JWT tokens - including Microsoft's

          Can multiple kerberos plugins be used at once - which would allow for a mixed deployment of kerberos and JWT?

          Show
          lmccay Larry McCay added a comment - There is some interesting work here. What I need to think about or we need to discuss is exactly who has the problem that this solution solves. I think that it is very interesting that this may end up making its way into MIT kerberos itself. Not sure how likely it would make it into AD though - so this will end up being a feature that requires MIT kerberos even in MS shops. So - if we look at the pains of the current authentication with kerberos approach which ones are actually solved by this solution: Kerberos/KDC setup - NO - in fact it is more complicated (maybe tooling can help) user accounts - NO - still needed keytabs - not really - replaced by JWT tokens (assuming that this is intended for services as well as users) kinit - NO - still required but will present JWT instead of username/token SPNEGO - NO - still required for REST APIs and browsers narrow integration opportunities - YES - there are number of solutions that can issue or exchange other tokens for JWT tokens - including Microsoft's Can multiple kerberos plugins be used at once - which would allow for a mixed deployment of kerberos and JWT?
          Hide
          drankye Kai Zheng added a comment -

          Below is the brief introduction about the proposed solution.

          We proposed to add token-preauth mechanism similar to PKINIT and OTP for Kerberos based on the Pre-Authentication framework, which allows users to authenticate to KDC using a JWT token instead of password. KDC authenticates the JWT token and issues TGT as it would trust the token authority/issuer via PKI mechanism. The proposal was submitted to Kerberos and IETF Kitten WG and they’re interested. Currently we’re collaborating with MIT team to work on the draft and standardize the mechanism. We also did a POC which implemented the token-preauth mechanism as a MIT Kerberos plugin. The plugin can be separately packaged as a Linux .so module and deployed additionally for existing installations. MIT also wish we could contribute the codes and make it available in their future releases. Before that we can make the plugin binary and source codes available to the community for experimental usage and review.

          So ideally token-preauth plugin can be deployed to a MIT Kerberos installation, the end users can authenticate to 3rd party JWT token authorities and get tokens, and then use the tokens to acquire Kerberos TGT from KDC. Based on that, we implemented the token authentication for Hadoop, with only a few of central modifications into the code base, as we don’t have to add another Authentication Method and the solution leverages the existing Kerberos support.

          We added KrbTokenLoginModule that extends the Krb5LoginModule and adds to support logging in using a token or token cache. The new module is compatible with Krb5LoginModule in configuration and functionality, thus can be used safely.

          We also added KerberosTokenAuthenticationHandler to support Hadoop web interfaces. It extends KerberosAuthenticationHandler and adds to support token authentication and perform the SPNEGO negotiation purely in server side in the new handler. Again the new handler is compatible with KerberosAuthenticationHandler and can be used safely.

          Token is used to exchange Kerberos ticket and ticket goes to Hadoop services as normally does. In addition to that, to employ the token attributes to enforce fine-grained authorization or whatever, a token derivation is encapsulated into ticket as Authorization data when KDC issues the ticket with the token. Then in service (Hadoop services) side, token can be queried and extracted from service ticket. We made this happen in both GSSAPI and SASL contexts as the both are used in Hadoop.

          As we can see or think of, the main concern for this solution may be that it requires to deploy additional plugin for existing Kerberos installations, and involves necessary identity accounts sync from identity management systems to Kerberos KDC. Most importantly, it requires Kerberos deployment as its prerequisite setup. We’re also discussing with MIT team about how to simplify Kerberos deployment especially for Hadoop large clusters and alleviate the overhead to employ PKINIT/token-preauth mechanisms like identity sync.

          Show
          drankye Kai Zheng added a comment - Below is the brief introduction about the proposed solution. We proposed to add token-preauth mechanism similar to PKINIT and OTP for Kerberos based on the Pre-Authentication framework, which allows users to authenticate to KDC using a JWT token instead of password. KDC authenticates the JWT token and issues TGT as it would trust the token authority/issuer via PKI mechanism. The proposal was submitted to Kerberos and IETF Kitten WG and they’re interested. Currently we’re collaborating with MIT team to work on the draft and standardize the mechanism. We also did a POC which implemented the token-preauth mechanism as a MIT Kerberos plugin. The plugin can be separately packaged as a Linux .so module and deployed additionally for existing installations. MIT also wish we could contribute the codes and make it available in their future releases. Before that we can make the plugin binary and source codes available to the community for experimental usage and review. So ideally token-preauth plugin can be deployed to a MIT Kerberos installation, the end users can authenticate to 3rd party JWT token authorities and get tokens, and then use the tokens to acquire Kerberos TGT from KDC. Based on that, we implemented the token authentication for Hadoop, with only a few of central modifications into the code base, as we don’t have to add another Authentication Method and the solution leverages the existing Kerberos support. We added KrbTokenLoginModule that extends the Krb5LoginModule and adds to support logging in using a token or token cache. The new module is compatible with Krb5LoginModule in configuration and functionality, thus can be used safely. We also added KerberosTokenAuthenticationHandler to support Hadoop web interfaces. It extends KerberosAuthenticationHandler and adds to support token authentication and perform the SPNEGO negotiation purely in server side in the new handler. Again the new handler is compatible with KerberosAuthenticationHandler and can be used safely. Token is used to exchange Kerberos ticket and ticket goes to Hadoop services as normally does. In addition to that, to employ the token attributes to enforce fine-grained authorization or whatever, a token derivation is encapsulated into ticket as Authorization data when KDC issues the ticket with the token. Then in service (Hadoop services) side, token can be queried and extracted from service ticket. We made this happen in both GSSAPI and SASL contexts as the both are used in Hadoop. As we can see or think of, the main concern for this solution may be that it requires to deploy additional plugin for existing Kerberos installations, and involves necessary identity accounts sync from identity management systems to Kerberos KDC. Most importantly, it requires Kerberos deployment as its prerequisite setup. We’re also discussing with MIT team about how to simplify Kerberos deployment especially for Hadoop large clusters and alleviate the overhead to employ PKINIT/token-preauth mechanisms like identity sync.
          Hide
          drankye Kai Zheng added a comment -

          The design doc. Your comments are welcome. Thanks.

          Show
          drankye Kai Zheng added a comment - The design doc. Your comments are welcome. Thanks.

            People

            • Assignee:
              drankye Kai Zheng
              Reporter:
              drankye Kai Zheng
            • Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

              • Created:
                Updated:

                Development