Hadoop Common
  1. Hadoop Common
  2. HADOOP-9392

Token based authentication and Single Sign On

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 3.0.0
    • Component/s: security
    • Labels:
      None
    • Tags:
      Project Rhino

      Description

      This is an umbrella entry for one of project Rhino’s topic, for details of project Rhino, please refer to https://github.com/intel-hadoop/project-rhino/. The major goal for this entry as described in project Rhino was

      “Core, HDFS, ZooKeeper, and HBase currently support Kerberos authentication at the RPC layer, via SASL. However this does not provide valuable attributes such as group membership, classification level, organizational identity, or support for user defined attributes. Hadoop components must interrogate external resources for discovering these attributes and at scale this is problematic. There is also no consistent delegation model. HDFS has a simple delegation capability, and only Oozie can take limited advantage of it. We will implement a common token based authentication framework to decouple internal user and service authentication from external mechanisms used to support it (like Kerberos)”

      We’d like to start our work from Hadoop-Common and try to provide common facilities by extending existing authentication framework which support:
      1. Pluggable token provider interface
      2. Pluggable token verification protocol and interface
      3. Security mechanism to distribute secrets in cluster nodes
      4. Delegation model of user authentication

      1. TokenAuth-breakdown.pdf
        306 kB
        Kai Zheng
      2. token-based-authn-plus-sso.pdf
        555 kB
        Kai Zheng
      3. token-based-authn-plus-sso-v2.0.pdf
        838 kB
        Kai Zheng

        Issue Links

          Activity

          Hide
          Daryn Sharp added a comment -

          I've been working to decouple kerberos as being the only auth method for hadoop. My work has been stalled but I will resume work soon, so we need to ensure we don't collide.

          Show
          Daryn Sharp added a comment - I've been working to decouple kerberos as being the only auth method for hadoop. My work has been stalled but I will resume work soon, so we need to ensure we don't collide.
          Hide
          Thomas NGUY added a comment -

          Wonderful Kai !!

          For starting, I'd like to ask some questions about the subject, as I have no experience with Hadoop, some points are still unclear for me.

          "Core, HDFS, ZooKeeper, and HBase currently support Kerberos authentication at the RPC layer, via SASL. However this does not provide valuable attributes such as group membership, classification level, organizational identity, or support for user defined attributes. Hadoop components must interrogate external resources for discovering these attributes and at scale this is problematic"

          I've seen that the NameNode and JobTracker get informations about the user by using its username and a pluggable interface that maps the username to a set of groups that the user belongs. Is this method problematic at larger scale? What do we have to do in that case? Include the user informations in the token?

          "We will implement a common token based authentication framework to decouple internal user and service authentication from external mechanisms used to support it (like Kerberos)”

          Here also, what is the problem with the token based authentification kerberos? What does "common" token based authentification means? Is there a link with the interactions of Hadoop components (see link http://clustermania.blogspot.jp/2011/11/hadoop-how-it-manages-security.html) ??

          These questions seem stupid but I really need to understand more about the subject before starting ^^. Oh BTW I'm doing my master research at the NII (National Institute of Informatic) in Tokyo, Its already late night so I might not be able to answer the same day.

          Best regards.
          Thomas

          Show
          Thomas NGUY added a comment - Wonderful Kai !! For starting, I'd like to ask some questions about the subject, as I have no experience with Hadoop, some points are still unclear for me. "Core, HDFS, ZooKeeper, and HBase currently support Kerberos authentication at the RPC layer, via SASL. However this does not provide valuable attributes such as group membership, classification level, organizational identity, or support for user defined attributes. Hadoop components must interrogate external resources for discovering these attributes and at scale this is problematic" I've seen that the NameNode and JobTracker get informations about the user by using its username and a pluggable interface that maps the username to a set of groups that the user belongs. Is this method problematic at larger scale? What do we have to do in that case? Include the user informations in the token? "We will implement a common token based authentication framework to decouple internal user and service authentication from external mechanisms used to support it (like Kerberos)” Here also, what is the problem with the token based authentification kerberos? What does "common" token based authentification means? Is there a link with the interactions of Hadoop components (see link http://clustermania.blogspot.jp/2011/11/hadoop-how-it-manages-security.html ) ?? These questions seem stupid but I really need to understand more about the subject before starting ^^. Oh BTW I'm doing my master research at the NII (National Institute of Informatic) in Tokyo, Its already late night so I might not be able to answer the same day. Best regards. Thomas
          Hide
          Kai Zheng added a comment -

          Hi Daryn,

          Thanks for letting know your work. I did some investigation about recent community work regarding Hadoop security and authentication, in JIRA HADOOP-8779 with its subtasks and related JIRAs. Most of the JIRAs were already closed. It’s great work and I can see Hadoop security related codes are more clean now. For the work you’re decoupling Kerberos as being the only auth method for Hadoop, in which JIRA if you had one? If you haven’t such a JIRA or design document for me to understand so that we can avoid to collide, would you please explain about it? Particularly I have these questions as below.
          1. Just checked the recent codes. In UserGroupInformation.java, it has SIMPLE, KERBEROS,TOKEN,CERTIFICATE,KERBEROS_SSL, PROXY as authentication methods, and most of them were newly added comparing with Hadoop 1.x version. Does this indicate that more authentication methods will/can be supported as these ones? Are these methods used or intended to be used for internal or external authentication? If it’s appropriate to mention “internal”/”external” here, which methods are for internal, and which are for external? How to avoid bad usage of them for clients for example internal one is used for external situation? In my understanding TOKEN(DIGEST) can be used both externally but I’m not sure there’re such an application, and internally for example when delegation token or job token is involved. Right?
          2. In Rhino project we’re coming up a common token for Hadoop authenticating to external identity system, which allows various existing authentication mechanisms can be used while Hadoop security core part doesn’t have to understand them since it only needs to talk to the common token. What do you think about this? Does this conflict with your work?

          Thanks for your time.

          Regards,
          Kai

          Show
          Kai Zheng added a comment - Hi Daryn, Thanks for letting know your work. I did some investigation about recent community work regarding Hadoop security and authentication, in JIRA HADOOP-8779 with its subtasks and related JIRAs. Most of the JIRAs were already closed. It’s great work and I can see Hadoop security related codes are more clean now. For the work you’re decoupling Kerberos as being the only auth method for Hadoop, in which JIRA if you had one? If you haven’t such a JIRA or design document for me to understand so that we can avoid to collide, would you please explain about it? Particularly I have these questions as below. 1. Just checked the recent codes. In UserGroupInformation.java, it has SIMPLE, KERBEROS,TOKEN,CERTIFICATE,KERBEROS_SSL, PROXY as authentication methods, and most of them were newly added comparing with Hadoop 1.x version. Does this indicate that more authentication methods will/can be supported as these ones? Are these methods used or intended to be used for internal or external authentication? If it’s appropriate to mention “internal”/”external” here, which methods are for internal, and which are for external? How to avoid bad usage of them for clients for example internal one is used for external situation? In my understanding TOKEN(DIGEST) can be used both externally but I’m not sure there’re such an application, and internally for example when delegation token or job token is involved. Right? 2. In Rhino project we’re coming up a common token for Hadoop authenticating to external identity system, which allows various existing authentication mechanisms can be used while Hadoop security core part doesn’t have to understand them since it only needs to talk to the common token. What do you think about this? Does this conflict with your work? Thanks for your time. Regards, Kai
          Hide
          Kai Zheng added a comment -

          Hi Thomas,

          We discussed about this via email in a long thread, and now it’s great you are about to do something implementing the common token. To get more involved in community, I’d like to cite your questions from your email regarding how to implement and answer them in this JIRA. See below. Thanks.

          Question 1: “If I'm not wrong, the authentication method is set at initialization by looking at "core-site.xml". During authentication, Hadoop has to check the UGI to determine which mode is used, and then choose the path accordingly (if authentication=.....).
          To decouple Hadoop internal use from external authentication, we should remove all these paths and choose only one solution (based on common token). It sounds rather complex depending on the context..”

          Answer: You’re right understanding the current Hadoop authentication. There’re already a few of authentication methods and one of them can be configured and then used globally. To go in our way and also keep backward compatible, we can keep the existing methods, at the same time add a new authentication method for common token authentication as an advanced one like Kerberos. To implement the new method, we can have token provider service which issues/validates/renews/revokes token for users utilizing pluggable authentication module based on external authentication provider as the backend we’d like to target. To keep it simple, the token provider can be implemented and provided as just a library, in future we can promote it to be standalone service.

          Question 2: “Do you have any suggestion about how the "common token provider" should be implemented?
          I have picture a mechanism like Shibboleth (for my intercloud scenario), the user is redirected to its authentication system to provide credentials (username,password) and if the authentication succeeds, the common token is provided.”
          “Do you have any idea which form the user attributes should be implemented in the common token?”

          Answer: Based on my answer to question 1, here it comes to how to implement the above mentioned token provider service. Providing the token provider service interfaces are defined for Hadoop, to implement it we can consider existing SSO/federation standards/solutions such as SAML or Shibboleth. For the common token we may consider SAML token as its format.

          Question 3: “After reading the discussions of the related Jiras, I also have another question: By using our framework, user will be forced to use common token even if he uses "simple authentication". Is it problematic?”

          Answer: As answered to question 1, we keep the existing methods when we provide this advanced option, and Hadoop customer won’t be forced to use which one.

          Show
          Kai Zheng added a comment - Hi Thomas, We discussed about this via email in a long thread, and now it’s great you are about to do something implementing the common token. To get more involved in community, I’d like to cite your questions from your email regarding how to implement and answer them in this JIRA. See below. Thanks. Question 1: “If I'm not wrong, the authentication method is set at initialization by looking at "core-site.xml". During authentication, Hadoop has to check the UGI to determine which mode is used, and then choose the path accordingly (if authentication=.....). To decouple Hadoop internal use from external authentication, we should remove all these paths and choose only one solution (based on common token). It sounds rather complex depending on the context..” Answer: You’re right understanding the current Hadoop authentication. There’re already a few of authentication methods and one of them can be configured and then used globally. To go in our way and also keep backward compatible, we can keep the existing methods, at the same time add a new authentication method for common token authentication as an advanced one like Kerberos. To implement the new method, we can have token provider service which issues/validates/renews/revokes token for users utilizing pluggable authentication module based on external authentication provider as the backend we’d like to target. To keep it simple, the token provider can be implemented and provided as just a library, in future we can promote it to be standalone service. Question 2: “Do you have any suggestion about how the "common token provider" should be implemented? I have picture a mechanism like Shibboleth (for my intercloud scenario), the user is redirected to its authentication system to provide credentials (username,password) and if the authentication succeeds, the common token is provided.” “Do you have any idea which form the user attributes should be implemented in the common token?” Answer: Based on my answer to question 1, here it comes to how to implement the above mentioned token provider service. Providing the token provider service interfaces are defined for Hadoop, to implement it we can consider existing SSO/federation standards/solutions such as SAML or Shibboleth. For the common token we may consider SAML token as its format. Question 3: “After reading the discussions of the related Jiras, I also have another question: By using our framework, user will be forced to use common token even if he uses "simple authentication". Is it problematic?” Answer: As answered to question 1, we keep the existing methods when we provide this advanced option, and Hadoop customer won’t be forced to use which one.
          Hide
          Thomas NGUY added a comment -

          Thank you for you answer Kai.

          As you have noticed, someone has recently created a JIRA to allow new authentification mechanisms based on JaaS and SaSl in Hadoop.
          https://issues.apache.org/jira/browse/HADOOP-9479
          His work could be very interesting for us since we're basically trying to implement a new authentification mechanism in order to keep the code backward compatible.

          Plus, his work could be coupled with https://github.com/biancini/Shibboleth-Authentication/tree/master/jaas_module which is a JaaS module for Shibboleth.
          But I guess, Shibboleth cannot be used as it is, since it doesnt provide token.

          Concerning the "Common token", the idea, if I'm not wrong, is to insert the user attributes in it so Hadoop internal services won't need to call a pluggable function to get them. However, does that mean that the "common token" will also be transmitted to Hadoop internal service?? Because we already have a token to authentificate to Hadoop internal services ( Delegation Token, Job Token ...) and it means that we will have to deal with 2 tokens.

          Thanks for reading me.

          Show
          Thomas NGUY added a comment - Thank you for you answer Kai. As you have noticed, someone has recently created a JIRA to allow new authentification mechanisms based on JaaS and SaSl in Hadoop. https://issues.apache.org/jira/browse/HADOOP-9479 His work could be very interesting for us since we're basically trying to implement a new authentification mechanism in order to keep the code backward compatible. Plus, his work could be coupled with https://github.com/biancini/Shibboleth-Authentication/tree/master/jaas_module which is a JaaS module for Shibboleth. But I guess, Shibboleth cannot be used as it is, since it doesnt provide token. Concerning the "Common token", the idea, if I'm not wrong, is to insert the user attributes in it so Hadoop internal services won't need to call a pluggable function to get them. However, does that mean that the "common token" will also be transmitted to Hadoop internal service?? Because we already have a token to authentificate to Hadoop internal services ( Delegation Token, Job Token ...) and it means that we will have to deal with 2 tokens. Thanks for reading me.
          Hide
          Kai Zheng added a comment -

          Hi Thomas,

          Thanks for your notice. I’ve related HADOOP-9479 and also HADOOP-9296 to this issue. Yes the provided custom authentication provider in HADOOP-9479’s patch can be helpful for us to implement token based authentication. Meanwhile, the work plus Shibboleth JAAS module you mentioned have nothing to do with the common token in our context. That’s what this JIRA focuses on, abstracting common token from all kinds of authentication mechanisms based on identity products and backends and using the token in Hadoop. You’re right common token holds identity attributes in bound from external authentication system and will be used in Hadoop for authorization and audit stuffs, thus we do have to deal with more tokens since we already have delegation token, job token and etc. Regarding how they coexist and what’s the difference, may I quote a previous email I responded, as below.

          “As you can see in the initial security design doc, the tokens you mentioned came out as work around in situations where Kerberos isn’t suitable due to performance, deployment and etc.
          Such tokens are different from out common authentication token targeted in Rhino project in following aspects.
          1. The tokens are more shared secrets for internal usage among Hadoop services, while common token is for authenticating to external identity system;
          2. The tokens are of various formats and very application specific, while common token desires to be unified of both basic identity attributes and extended attributes;
          3. Common token can be used for single sign on among Hadoop services, while the tokens are not intended in this consideration;
          4. Common token with its built-in attributes can be used for authorization in Hadoop services, but the tokens are only applicable for the targeted and special scenarios.
          At the same time, they don’t conflict.
          1. When external authentication is passed with common token responded, relevant Hadoop services/clients can request the tokens for example job token for internal usage;
          2. Maybe we can borrow some “pattern” or “model” from existing tokens for our common token design.
          Anyhow, common token is not for replacing the existing tokens, they can coexist serving different purposes.

          For you to start with and our possible POC of this concept if you’d like, I think you could:
          1. Have a simple common token;
          2. Choose one authentication and JAAS module as you may already got for Shibboleth;
          3. Wrap the authentication result and identity attributes into the common token;
          4. Authenticate the client common token to server via the mechanism proposed and provided in HADOOP-9479;
          5. Try to use the common token for authorization.
          Please let me if this is doable for you.

          Thanks.

          Show
          Kai Zheng added a comment - Hi Thomas, Thanks for your notice. I’ve related HADOOP-9479 and also HADOOP-9296 to this issue. Yes the provided custom authentication provider in HADOOP-9479 ’s patch can be helpful for us to implement token based authentication. Meanwhile, the work plus Shibboleth JAAS module you mentioned have nothing to do with the common token in our context. That’s what this JIRA focuses on, abstracting common token from all kinds of authentication mechanisms based on identity products and backends and using the token in Hadoop. You’re right common token holds identity attributes in bound from external authentication system and will be used in Hadoop for authorization and audit stuffs, thus we do have to deal with more tokens since we already have delegation token, job token and etc. Regarding how they coexist and what’s the difference, may I quote a previous email I responded, as below. “As you can see in the initial security design doc, the tokens you mentioned came out as work around in situations where Kerberos isn’t suitable due to performance, deployment and etc. Such tokens are different from out common authentication token targeted in Rhino project in following aspects. 1. The tokens are more shared secrets for internal usage among Hadoop services, while common token is for authenticating to external identity system; 2. The tokens are of various formats and very application specific, while common token desires to be unified of both basic identity attributes and extended attributes; 3. Common token can be used for single sign on among Hadoop services, while the tokens are not intended in this consideration; 4. Common token with its built-in attributes can be used for authorization in Hadoop services, but the tokens are only applicable for the targeted and special scenarios. At the same time, they don’t conflict. 1. When external authentication is passed with common token responded, relevant Hadoop services/clients can request the tokens for example job token for internal usage; 2. Maybe we can borrow some “pattern” or “model” from existing tokens for our common token design. Anyhow, common token is not for replacing the existing tokens, they can coexist serving different purposes. “ For you to start with and our possible POC of this concept if you’d like, I think you could: 1. Have a simple common token; 2. Choose one authentication and JAAS module as you may already got for Shibboleth; 3. Wrap the authentication result and identity attributes into the common token; 4. Authenticate the client common token to server via the mechanism proposed and provided in HADOOP-9479 ; 5. Try to use the common token for authorization. Please let me if this is doable for you. Thanks.
          Hide
          Kai Zheng added a comment -

          Here is the first version of our architecture and design doc for review. Comments are welcome.

          Show
          Kai Zheng added a comment - Here is the first version of our architecture and design doc for review. Comments are welcome.
          Hide
          Larry McCay added a comment -

          Thank you for the design doc, Kai. Interesting read. I will be organizing my thoughts and questions over the next couple days and will hopefully have some thoughts for you in a day or two. I think it may make sense to try and concentrate on an area at a time with my comments and discussion. Thanks again.

          Show
          Larry McCay added a comment - Thank you for the design doc, Kai. Interesting read. I will be organizing my thoughts and questions over the next couple days and will hopefully have some thoughts for you in a day or two. I think it may make sense to try and concentrate on an area at a time with my comments and discussion. Thanks again.
          Hide
          Thomas NGUY added a comment -

          WoW, Thank you for posting the design Doc Kai. It is a great job. I will think on my own and return to you. Regards

          Show
          Thomas NGUY added a comment - WoW, Thank you for posting the design Doc Kai. It is a great job. I will think on my own and return to you. Regards
          Hide
          Larry McCay added a comment -

          Hello Kai - I have added a document to https://issues.apache.org/jira/browse/HADOOP-9533 that describes the client interactions with the HSSO service and have called out a couple aspects that we need to rationalize with our related efforts.

          One is regarding the composability of the authentication providers in your effort within the chains of HSSO token endpoints. It is listed as an open question within that document. Another is related to the granularity of the tokens within our proposals. The HSSO overview describes the acquisition and use of a couple different types of tokens. One is the cluster access token which is issued to users to allow them to request service access tokens in order to access specific services within the cluster.

          The cluster access token records the authenticated identity and claims that represent the event of authentication. It has a longer lifespan than service access tokens and are cryptographically verifiable. Service access tokens are issued to represent the ability for an authenticated user to request resources of a particular service. They contain a representation of the authenticated user as well as additional identity and profile attributes to be used in authorization policy. Providing a relatively short lifespan for the service access token allows for the groups and other attributes to be refreshed in a timely manner while not requiring re-authentication or attribute server interaction within the lifespan of the access token.

          I am looking forward to your thoughts on rationalizing and collaborating on these particular aspects of your pluggable authentication providers and token based authentication. Please take a look at the overview document in https://issues.apache.org/jira/browse/HADOOP-9533 and let me know what you think.

          Thanks!

          Show
          Larry McCay added a comment - Hello Kai - I have added a document to https://issues.apache.org/jira/browse/HADOOP-9533 that describes the client interactions with the HSSO service and have called out a couple aspects that we need to rationalize with our related efforts. One is regarding the composability of the authentication providers in your effort within the chains of HSSO token endpoints. It is listed as an open question within that document. Another is related to the granularity of the tokens within our proposals. The HSSO overview describes the acquisition and use of a couple different types of tokens. One is the cluster access token which is issued to users to allow them to request service access tokens in order to access specific services within the cluster. The cluster access token records the authenticated identity and claims that represent the event of authentication. It has a longer lifespan than service access tokens and are cryptographically verifiable. Service access tokens are issued to represent the ability for an authenticated user to request resources of a particular service. They contain a representation of the authenticated user as well as additional identity and profile attributes to be used in authorization policy. Providing a relatively short lifespan for the service access token allows for the groups and other attributes to be refreshed in a timely manner while not requiring re-authentication or attribute server interaction within the lifespan of the access token. I am looking forward to your thoughts on rationalizing and collaborating on these particular aspects of your pluggable authentication providers and token based authentication. Please take a look at the overview document in https://issues.apache.org/jira/browse/HADOOP-9533 and let me know what you think. Thanks!
          Hide
          Kai Zheng added a comment -

          I agree we need to rationalize to avoid overlap and more importantly simplify configuration and deployment issues for the ecosystem. Let’s compare TokenAuth with HSSO, and see what is missing that HSSO would need to add.

          HSSO provides a Cluster Access Token, which is similar to the Identity Token in TokenAuth. HSSO also provides a Service Access Token to authenticate targeted services and bear additional attributes for authorization. In TokenAuth, the Identity Token (from now on just “access token”) is issued to the client when authentication to the Token Authentication Service (TAS) for the user domain is successful. Then the same token is used to authenticate to the target service. TokenAuth’s access token also carries various attributes set at authentication time, and supports arbitrary additional attributes. In HADOOP-9466 we have designed a unified authorization framework based on this singular access token. In my view, the access token is for authorization of concrete resources. HSSO’s “Service Access Token” is in contrast an authenticator. It lacks features for fine grained authorization. I think that is why you are wondering to what degree you can match up the OAuth spec to this. The HSSO server registers and maintains authentication providers, and TokenAuth’s TAS does the same work. However, TAS not only covers authentication mechanism for RPC and command line environments, it also covers IDPs for web services. The two parts of TokenAuth’s TAS is a complete solution.

          What is interesting about HSSO is that it targets the multiple Hadoop cluster use case and lets clients dynamically discover the right authentication provider for the given target cluster. TokenAuth targets mainly the singular cluster use case but allows multiple organizations to use it. I think HSSO and TokenAuth can complement each other in a seamless way if HSSO can focus on the multiple cluster use case layered on top of TokenAuth authentication within the cluster, as follows:

          1. The HSSO server is the central registry of Token Authentication Services (TASes) of one or more Hadoop clusters, and it accepts client requests to be forwarded to the appropriate TAS for authentication in the target cluster;
          2. Enhance the TAS allowing it to identify an HSSO server by configuring the HSSO registry endpoint;
          3. The TAS registers itself to the HSSO with the cluster name as the key;
          4. Hadoop clients first contact the HSSO to locate the appropriate TAS by cluster name and domain, then connect to the TAS for authentication. The concrete authentication provider(s) that will be used to authenticate the client will be determined by the TAS by default. Of course if HSSO needs to support dynamically discovering IDP(s) for clients this would be ok since it won’t impact the TAS much. Note that the discovery should be local to the TAS.

          Please let me know your thoughts. Thanks.

          Show
          Kai Zheng added a comment - I agree we need to rationalize to avoid overlap and more importantly simplify configuration and deployment issues for the ecosystem. Let’s compare TokenAuth with HSSO, and see what is missing that HSSO would need to add. HSSO provides a Cluster Access Token, which is similar to the Identity Token in TokenAuth. HSSO also provides a Service Access Token to authenticate targeted services and bear additional attributes for authorization. In TokenAuth, the Identity Token (from now on just “access token”) is issued to the client when authentication to the Token Authentication Service (TAS) for the user domain is successful. Then the same token is used to authenticate to the target service. TokenAuth’s access token also carries various attributes set at authentication time, and supports arbitrary additional attributes. In HADOOP-9466 we have designed a unified authorization framework based on this singular access token. In my view, the access token is for authorization of concrete resources. HSSO’s “Service Access Token” is in contrast an authenticator. It lacks features for fine grained authorization. I think that is why you are wondering to what degree you can match up the OAuth spec to this. The HSSO server registers and maintains authentication providers, and TokenAuth’s TAS does the same work. However, TAS not only covers authentication mechanism for RPC and command line environments, it also covers IDPs for web services. The two parts of TokenAuth’s TAS is a complete solution. What is interesting about HSSO is that it targets the multiple Hadoop cluster use case and lets clients dynamically discover the right authentication provider for the given target cluster. TokenAuth targets mainly the singular cluster use case but allows multiple organizations to use it. I think HSSO and TokenAuth can complement each other in a seamless way if HSSO can focus on the multiple cluster use case layered on top of TokenAuth authentication within the cluster, as follows: 1. The HSSO server is the central registry of Token Authentication Services (TASes) of one or more Hadoop clusters, and it accepts client requests to be forwarded to the appropriate TAS for authentication in the target cluster; 2. Enhance the TAS allowing it to identify an HSSO server by configuring the HSSO registry endpoint; 3. The TAS registers itself to the HSSO with the cluster name as the key; 4. Hadoop clients first contact the HSSO to locate the appropriate TAS by cluster name and domain, then connect to the TAS for authentication. The concrete authentication provider(s) that will be used to authenticate the client will be determined by the TAS by default. Of course if HSSO needs to support dynamically discovering IDP(s) for clients this would be ok since it won’t impact the TAS much. Note that the discovery should be local to the TAS. Please let me know your thoughts. Thanks.
          Hide
          Daryn Sharp added a comment -

          I've swamped lately but I'll try to catch up on this jira this afternoon.

          Show
          Daryn Sharp added a comment - I've swamped lately but I'll try to catch up on this jira this afternoon.
          Hide
          Larry McCay added a comment -

          Thank you for taking the time to read the overview, Kai. You've pointed out a couple things that I will have to make more clear in a revision to that document. One being that you interpreted the service access token as purely being used as an authenticator. This is not actually the case. One of the primary purposes of having a secondary token for accessing the service is carrying the arbitrary attributes used for fine-grained access control decisions. This is why the service access token expires frequently. Since the acquisition of these attributes is done at service access token request time, those attributes that are interrogated by policy are able to be as fresh as possible without having to re-authenticate. Cluster access tokens, on the other hand, are longer lived and are used to represent the authenticated user and the event of authentication itself. This way, service access tokens can be provided to clients that have been authenticated with specific types of authentication event requirements - like a strength level of the authentication performed. Perhaps, a deployment would prefer one IDP over another for accessing a particular service. Like the Authorization Codes in OAuth, it can be persisted and utilized by many types of applications - upon expiry it needs to be refreshed or the user needs to be re-authenticated. By indicating specific scopes for the cluster access token, it may be given to third-parties to use on the user's behalf with a constrained ability to acquire service access tokens to only those services indicated by the list of scopes. We see the separation of concerns between the two tokens as important to eliminate unnecessary re-authentication, limit the amount of damage that a compromised token can enable and ensure freshness of authorization related attributes.

          Your thoughts of HSSO across clusters is very much in line with what we envision as our use of HSSO within the perimeter security story that is targeted by Knox. I tried to leave the multi-cluster usecases out of this overview as to not muddy the waters with the perimeter story. It seems that maybe it leaked into the document a bit - I guess the IDP Discovery protocol leaks it. Anyway, I am curious whether you see a usecase for multiple clusters without it being a perimeter based deployment. I think that I will save the discussion of the multi-cluser deployments for another thread though.

          We definitely envision and plan on a service registry component to HSSO which aligns well with your thoughts about layering HSSO over TAS - we see the authenticating and service registry components as going hand in hand.

          We are in the process of making arrangements for a security get together at Hadoop Summit next month. I hope that you will be able to attend. This will be a great opportunity to get together and talk about this rationalization, layering and multi-cluster deployment usecases. We should try and identify aspects of this work for the agenda.

          Show
          Larry McCay added a comment - Thank you for taking the time to read the overview, Kai. You've pointed out a couple things that I will have to make more clear in a revision to that document. One being that you interpreted the service access token as purely being used as an authenticator. This is not actually the case. One of the primary purposes of having a secondary token for accessing the service is carrying the arbitrary attributes used for fine-grained access control decisions. This is why the service access token expires frequently. Since the acquisition of these attributes is done at service access token request time, those attributes that are interrogated by policy are able to be as fresh as possible without having to re-authenticate. Cluster access tokens, on the other hand, are longer lived and are used to represent the authenticated user and the event of authentication itself. This way, service access tokens can be provided to clients that have been authenticated with specific types of authentication event requirements - like a strength level of the authentication performed. Perhaps, a deployment would prefer one IDP over another for accessing a particular service. Like the Authorization Codes in OAuth, it can be persisted and utilized by many types of applications - upon expiry it needs to be refreshed or the user needs to be re-authenticated. By indicating specific scopes for the cluster access token, it may be given to third-parties to use on the user's behalf with a constrained ability to acquire service access tokens to only those services indicated by the list of scopes. We see the separation of concerns between the two tokens as important to eliminate unnecessary re-authentication, limit the amount of damage that a compromised token can enable and ensure freshness of authorization related attributes. Your thoughts of HSSO across clusters is very much in line with what we envision as our use of HSSO within the perimeter security story that is targeted by Knox. I tried to leave the multi-cluster usecases out of this overview as to not muddy the waters with the perimeter story. It seems that maybe it leaked into the document a bit - I guess the IDP Discovery protocol leaks it. Anyway, I am curious whether you see a usecase for multiple clusters without it being a perimeter based deployment. I think that I will save the discussion of the multi-cluser deployments for another thread though. We definitely envision and plan on a service registry component to HSSO which aligns well with your thoughts about layering HSSO over TAS - we see the authenticating and service registry components as going hand in hand. We are in the process of making arrangements for a security get together at Hadoop Summit next month. I hope that you will be able to attend. This will be a great opportunity to get together and talk about this rationalization, layering and multi-cluster deployment usecases. We should try and identify aspects of this work for the agenda.
          Hide
          Thomas NGUY added a comment -

          From my point of view, we have two different designs to SSO in Hadoop but not necessary incompatible.
          Concerning the « Token » design, in HADOOP-9533, the Service Access Token targets a specific ressource (defined by the service URL) and have a low lifetime while in HADOOP-9392, the Identity Token can be used for any services? (belonging to at trusted tokenrealm) and doesnt have an expiration time.
          Both of them carry extended attributed for fined-grained access control decisions or for the service itself.
          I'm just curious to learn more about how the Unified Authorization Framework (Hadoop-9466) would used to common token to make decisions.

          Show
          Thomas NGUY added a comment - From my point of view, we have two different designs to SSO in Hadoop but not necessary incompatible. Concerning the « Token » design, in HADOOP-9533 , the Service Access Token targets a specific ressource (defined by the service URL) and have a low lifetime while in HADOOP-9392 , the Identity Token can be used for any services? (belonging to at trusted tokenrealm) and doesnt have an expiration time. Both of them carry extended attributed for fined-grained access control decisions or for the service itself. I'm just curious to learn more about how the Unified Authorization Framework (Hadoop-9466) would used to common token to make decisions.
          Hide
          Kai Zheng added a comment -

          Larry - You use an extra Service Access Token to “eliminate unnecessary re-authentication, limit the amount of damage that a compromised token can enable and ensure freshness of authorization related attributes”, however TokenAuth addresses these concerns without requiring two types of tokens. The Identity Token is cached at the client and will be reused for subsequent requests to avoid re-authentication. Tokens in the cache are expired at a configurable interval, typically a few hours instead of days. When the client requests access to a service, the token provided by the client is authenticated to the TAS configured for the service and at that time the TAS validates the token, verifies the associated identity is valid and also if the principal is still allowed to access the service. Furthermore identity attributes should be of the same freshness with the identity itself. As with other attributes that can be used for authorization, they should be collected and evaluated on a per request basis, and they often involve resources other than coarse-grained service level access considerations. Such attributes are often retrieved during authorization enforcement from Attribute Authority(AA), for example in XACML context it should be in PDP. We cover fine-grained authorization in the Unified Authorization Framework (HADOOP-9466) based on TokenAuth.

          If there’s a relationship between a Service Access Token and an OAuth Access Token, I’m not sure it’s in the right way. In OAuth, the Access Token is issued by Authorization Server in an Authorization Grant from the Resource Owner. Do you intend for the HSSO service to take on the role of an OAuth Resource Owner? If yes then how will you handle service users?

          Please consider if we can avoid introducing additional tokens like the Service Access Token. Probably we can address your design concerns in other ways. Also, do we need two similar tokens for authentication and SSO in Hadoop (the Identity Token in TokenAuth, and the Cluster Access Token in HSSO)? I believe Hadoop only needs one. Cleanup of redundant constructs emitted from the two partially overlapping proposals here I would think is the first step toward aligning these efforts. We should consider a deployment model where we build HSSO layered on TAS.

          Sounds like a good plan for us to meet at the Hadoop Summit. I will try but not sure if I can make it. Others from the team who are also working on this will be there and participate in the discussion. If there is a bridge, I will try to join the bridge. Thanks for setting this up.

          Show
          Kai Zheng added a comment - Larry - You use an extra Service Access Token to “eliminate unnecessary re-authentication, limit the amount of damage that a compromised token can enable and ensure freshness of authorization related attributes”, however TokenAuth addresses these concerns without requiring two types of tokens. The Identity Token is cached at the client and will be reused for subsequent requests to avoid re-authentication. Tokens in the cache are expired at a configurable interval, typically a few hours instead of days. When the client requests access to a service, the token provided by the client is authenticated to the TAS configured for the service and at that time the TAS validates the token, verifies the associated identity is valid and also if the principal is still allowed to access the service. Furthermore identity attributes should be of the same freshness with the identity itself. As with other attributes that can be used for authorization, they should be collected and evaluated on a per request basis, and they often involve resources other than coarse-grained service level access considerations. Such attributes are often retrieved during authorization enforcement from Attribute Authority(AA), for example in XACML context it should be in PDP. We cover fine-grained authorization in the Unified Authorization Framework ( HADOOP-9466 ) based on TokenAuth. If there’s a relationship between a Service Access Token and an OAuth Access Token, I’m not sure it’s in the right way. In OAuth, the Access Token is issued by Authorization Server in an Authorization Grant from the Resource Owner. Do you intend for the HSSO service to take on the role of an OAuth Resource Owner? If yes then how will you handle service users? Please consider if we can avoid introducing additional tokens like the Service Access Token. Probably we can address your design concerns in other ways. Also, do we need two similar tokens for authentication and SSO in Hadoop (the Identity Token in TokenAuth, and the Cluster Access Token in HSSO)? I believe Hadoop only needs one. Cleanup of redundant constructs emitted from the two partially overlapping proposals here I would think is the first step toward aligning these efforts. We should consider a deployment model where we build HSSO layered on TAS. Sounds like a good plan for us to meet at the Hadoop Summit. I will try but not sure if I can make it. Others from the team who are also working on this will be there and participate in the discussion. If there is a bridge, I will try to join the bridge. Thanks for setting this up.
          Hide
          Larry McCay added a comment -

          Hi Kai - my previous response hadn't really intended to compare the designs as much as correct something that - based on your interpretation - must have been understated in my client interaction overview on HADOOP-9533. Any decent design would strive to address these same goals - I didn't mean to imply that your's doesn't. Ultimately, we just want to make sure that we are covering all relevant goals and usecases in our converged approach. I mentioned the OAuth similarity as an analogue with an existing token based protocol for addressing restricted capabilities of a given token. The same type of analogue can be drawn between HADOOP-9533 and Kerberos. The cluster access token is similar to the TGT while the service access token is much like a service ticket. Service user scenarios are covered in an upcoming overview document for In-cluster trust establishment and service:service authentication that will be published on HADOOP-9533.

          I do agree that keeping the number of tokens that are required for a client/user-agent to keep track of is a good idea from the complexity and developer experience perspectives. We will however have to strike a balance between simplicity and a design that meets all of our goals.

          Enumerating the properties of each design at this point will provide little value. Instead, I propose that we articulate all of the relevant goals and usecases of the required design. Including attack vectors and scenarios that we must address with the ultimate design. Deployment scenarios of the Hadoop cluster will be important to consider as well. We will also need to take into account the capabilities of all the clients in the ecosystem for supporting it. With some of this groundwork done, we can get together at the Summit with an agenda item to reconcile those goals and usecases into a converged design. Discussions around the deployment scenarios will help drive how any layering will be done between Knox at the perimeter and HSSO and TAS inside the cluster.

          I do have some questions about the authorization approach as well. We can keep that as a separate discussion - probably better had on the HADOOP-9466 Jira?

          I will look into having a bridge available for the get together.

          Show
          Larry McCay added a comment - Hi Kai - my previous response hadn't really intended to compare the designs as much as correct something that - based on your interpretation - must have been understated in my client interaction overview on HADOOP-9533 . Any decent design would strive to address these same goals - I didn't mean to imply that your's doesn't. Ultimately, we just want to make sure that we are covering all relevant goals and usecases in our converged approach. I mentioned the OAuth similarity as an analogue with an existing token based protocol for addressing restricted capabilities of a given token. The same type of analogue can be drawn between HADOOP-9533 and Kerberos. The cluster access token is similar to the TGT while the service access token is much like a service ticket. Service user scenarios are covered in an upcoming overview document for In-cluster trust establishment and service:service authentication that will be published on HADOOP-9533 . I do agree that keeping the number of tokens that are required for a client/user-agent to keep track of is a good idea from the complexity and developer experience perspectives. We will however have to strike a balance between simplicity and a design that meets all of our goals. Enumerating the properties of each design at this point will provide little value. Instead, I propose that we articulate all of the relevant goals and usecases of the required design. Including attack vectors and scenarios that we must address with the ultimate design. Deployment scenarios of the Hadoop cluster will be important to consider as well. We will also need to take into account the capabilities of all the clients in the ecosystem for supporting it. With some of this groundwork done, we can get together at the Summit with an agenda item to reconcile those goals and usecases into a converged design. Discussions around the deployment scenarios will help drive how any layering will be done between Knox at the perimeter and HSSO and TAS inside the cluster. I do have some questions about the authorization approach as well. We can keep that as a separate discussion - probably better had on the HADOOP-9466 Jira? I will look into having a bridge available for the get together.
          Hide
          Kai Zheng added a comment -

          Thomas - Identity Token represents the identity and will be used to access all Hadoop services. However, any service can have its own TAS deployed to authenticate the identity token. Of course in most cases they can share the same TAS instance. The token sure has an expiration time and user can also initiate token expire command to expire it earlier. The token can be cached and the cache is to be expired at a configurable internal.

          Show
          Kai Zheng added a comment - Thomas - Identity Token represents the identity and will be used to access all Hadoop services. However, any service can have its own TAS deployed to authenticate the identity token. Of course in most cases they can share the same TAS instance. The token sure has an expiration time and user can also initiate token expire command to expire it earlier. The token can be cached and the cache is to be expired at a configurable internal.
          Hide
          Kai Zheng added a comment -

          Larry – Thanks, when the design document updates are posted here we can continue to discuss this.

          We do enumerate a few high level use cases and requirements targeted by TokenAuth in our design slides, various deployment modes and scenarios are also there. What are the relevant related use cases and scenarios that HSSO targets? Particularly the part for in-cluster deployment. We can rationalize some of that here now, to see what can be handled by TokenAuth and TAS. It is not necessary to push all discussion to the meetup at the summit, the time we will have there for discussion is finite anyway.

          Your questions are welcome on the authorization approach. Let’s do it in HADOOP-9466.

          Very appreciative for the bridge.

          Show
          Kai Zheng added a comment - Larry – Thanks, when the design document updates are posted here we can continue to discuss this. We do enumerate a few high level use cases and requirements targeted by TokenAuth in our design slides, various deployment modes and scenarios are also there. What are the relevant related use cases and scenarios that HSSO targets? Particularly the part for in-cluster deployment. We can rationalize some of that here now, to see what can be handled by TokenAuth and TAS. It is not necessary to push all discussion to the meetup at the summit, the time we will have there for discussion is finite anyway. Your questions are welcome on the authorization approach. Let’s do it in HADOOP-9466 . Very appreciative for the bridge.
          Hide
          Kyle Leckie added a comment -

          Kai - Thanks for the proposal. A few questions:

          1) Is it correct that the identity token is not restricted to a particular service and the same token is valid for all services that trust the token realm?

          2) Is is correct that the identity token is utilized as a bearer token and not a shared secret?

          3) You have repeatedly grouped LDAP and AD as the same form of authentication. Can you further clarify on the meaning of this grouping?

          Show
          Kyle Leckie added a comment - Kai - Thanks for the proposal. A few questions: 1) Is it correct that the identity token is not restricted to a particular service and the same token is valid for all services that trust the token realm? 2) Is is correct that the identity token is utilized as a bearer token and not a shared secret? 3) You have repeatedly grouped LDAP and AD as the same form of authentication. Can you further clarify on the meaning of this grouping?
          Hide
          Daryn Sharp added a comment -

          This is great information but a bit overwhelming. I have so many questions, I'm not sure where to start. Bear with me, because I'm sure I missed some details.

          The client generates its own id token? While this can be verified with PKI, what prevents another service from stealing the id token for its own use?

          I'm concerned about the client embedding its groups into the id token. The client is always untrustworthy, so it's really the server that should determine and enforce groups.

          I'm unclear how existing auth, like kerberos, fits into this model. Does the client do kerberos auth with its TGT to the authn server before generating its id token? If so, what prevents the client from skipping that step? A kerberos callback is mentioned. Does this mean the client is passing its TGT to another service?

          How does a server compare its id token with the client's id token? What's embedded in the client id token that allows this to occur?

          What does it mean that an authentication module is JAAS based but not necessarily the same?

          Additional cluster-specific conf settings should be avoided if possible. Config management for multi-cluster envs is already burdensome. Ideally we/hadoop should be moving towards a client being able to access with multiple clusters with a single config.

          Has a meetup time at the summit been organized? I've scanned my mountain of email but didn't see one.

          Show
          Daryn Sharp added a comment - This is great information but a bit overwhelming. I have so many questions, I'm not sure where to start. Bear with me, because I'm sure I missed some details. The client generates its own id token? While this can be verified with PKI, what prevents another service from stealing the id token for its own use? I'm concerned about the client embedding its groups into the id token. The client is always untrustworthy, so it's really the server that should determine and enforce groups. I'm unclear how existing auth, like kerberos, fits into this model. Does the client do kerberos auth with its TGT to the authn server before generating its id token? If so, what prevents the client from skipping that step? A kerberos callback is mentioned. Does this mean the client is passing its TGT to another service? How does a server compare its id token with the client's id token? What's embedded in the client id token that allows this to occur? What does it mean that an authentication module is JAAS based but not necessarily the same? Additional cluster-specific conf settings should be avoided if possible. Config management for multi-cluster envs is already burdensome. Ideally we/hadoop should be moving towards a client being able to access with multiple clusters with a single config. Has a meetup time at the summit been organized? I've scanned my mountain of email but didn't see one.
          Hide
          Alejandro Abdelnur added a comment -

          +1 for a meetup

          Show
          Alejandro Abdelnur added a comment - +1 for a meetup
          Hide
          Larry McCay added a comment -

          Hey Daryn - I share many of the same questions. I wasn't sure whether some of them were a flawed interpretation on my part or whether that was truly what was being proposed. It may still be the former but at least I'm not alone.

          We do have a meetup being planned for the Summit where we can hash out these sorts issues. It seems to me that we will have to start with the threats and usecases that need to be accounted for rather than starting with a lot of implementation details. We will try and begin these discussions before the summit - as our time will be limited there. We'll try and make enough progress beforehand that we can set a manageable agenda for the meetup.

          We will post the meetup/design session details here and on the mailing lists once they are finalized.

          Show
          Larry McCay added a comment - Hey Daryn - I share many of the same questions. I wasn't sure whether some of them were a flawed interpretation on my part or whether that was truly what was being proposed. It may still be the former but at least I'm not alone. We do have a meetup being planned for the Summit where we can hash out these sorts issues. It seems to me that we will have to start with the threats and usecases that need to be accounted for rather than starting with a lot of implementation details. We will try and begin these discussions before the summit - as our time will be limited there. We'll try and make enough progress beforehand that we can set a manageable agenda for the meetup. We will post the meetup/design session details here and on the mailing lists once they are finalized.
          Hide
          Kai Zheng added a comment -

          Kyle – Thanks for your questions.

          1) The client token needs to authenticate to a service when it’s used to access the service. We consider a token realm trust based authenticator for the simple case and initial implementation. A user can implement and configure more advanced or enterprise specific client token authenticators on a per service basis. (This is as the doc states: “Other mechanisms other than trust method may be enforced for token realm to authenticate other token realms in future”.) However, enforcement of complicated client token authentication polices might introduce significant performance overhead on handshaking with the service. We might have a update about this and consider other approaches when going on with our authorization work. Initially the TokenAuth design focuses on pluggable authentication for multiple domains and single sign on using a single token. Meanwhile elsewhere we are also working on a Unified Authorization framework. That authorization framework is where we would introduce similar constructs like the OAuth Access Token to enforce fine-grained access control, authorization trust management, and transference. That access token can be issued by another entity, some Authorization Server other than TAS, targeting one service, maybe a set of resources. Very possibly, the resulting access token in the authorization framework can be aligned with the Service Access Token. In this way the service authenticating client token can be simplified since relevant policies per the service have already been enforced prior to the service request when issuing the access token.
          2) The Identity Token is a bearer token. It is signed and encrypted when issued by the TAS, and decrypted and verified when used by the service. The transport of the token is secured, either between TAS and client, or client and service.
          3) About AD/LDAP: We simply group Active Directory and LDAP as examples of a family of related identity stores. Both speak LDAP. When designing connectors to LDAP and AD as identity back ends for TokenAuth and the TAS, we intend to look at them as separate design targets on account of differing capabilities. Particularly, considering Active Directory is dominantly deployed in enterprises and also evolving in cloud platform as an IdP solution, it could make sense to integrate AD in more advanced mode via an authentication module in TAS. Such module can be used in web contexts, where web browser can be redirected by HadoopTokenAuthnFilter to the AD IdP to authenticate when accessing Hadoop service via web interface. When being back HadoopTokenAuthnHandler will represent the result IdP token to the authentication module and TAS to exchange an identity token, and then the result identity token will be used to request corresponding resources.

          Show
          Kai Zheng added a comment - Kyle – Thanks for your questions. 1) The client token needs to authenticate to a service when it’s used to access the service. We consider a token realm trust based authenticator for the simple case and initial implementation. A user can implement and configure more advanced or enterprise specific client token authenticators on a per service basis. (This is as the doc states: “Other mechanisms other than trust method may be enforced for token realm to authenticate other token realms in future”.) However, enforcement of complicated client token authentication polices might introduce significant performance overhead on handshaking with the service. We might have a update about this and consider other approaches when going on with our authorization work. Initially the TokenAuth design focuses on pluggable authentication for multiple domains and single sign on using a single token. Meanwhile elsewhere we are also working on a Unified Authorization framework. That authorization framework is where we would introduce similar constructs like the OAuth Access Token to enforce fine-grained access control, authorization trust management, and transference. That access token can be issued by another entity, some Authorization Server other than TAS, targeting one service, maybe a set of resources. Very possibly, the resulting access token in the authorization framework can be aligned with the Service Access Token. In this way the service authenticating client token can be simplified since relevant policies per the service have already been enforced prior to the service request when issuing the access token. 2) The Identity Token is a bearer token. It is signed and encrypted when issued by the TAS, and decrypted and verified when used by the service. The transport of the token is secured, either between TAS and client, or client and service. 3) About AD/LDAP: We simply group Active Directory and LDAP as examples of a family of related identity stores. Both speak LDAP. When designing connectors to LDAP and AD as identity back ends for TokenAuth and the TAS, we intend to look at them as separate design targets on account of differing capabilities. Particularly, considering Active Directory is dominantly deployed in enterprises and also evolving in cloud platform as an IdP solution, it could make sense to integrate AD in more advanced mode via an authentication module in TAS. Such module can be used in web contexts, where web browser can be redirected by HadoopTokenAuthnFilter to the AD IdP to authenticate when accessing Hadoop service via web interface. When being back HadoopTokenAuthnHandler will represent the result IdP token to the authentication module and TAS to exchange an identity token, and then the result identity token will be used to request corresponding resources.
          Hide
          Kyle Leckie added a comment -

          Thanks for your thorough response Kai,

          1) I agree on having support for tokens with pluggable token validation. Having the token contain an audience property in order to limit its scope should not add significant overhead but I take your point about having an initial implementation and progressing from there on an as needed basis.
          2)3) Thanks for the clarification.

          It seems that supporting pluggable token validation is a significant feature in itself and the TAS work can be layered on top. What do you think of having the token validation and transmission as a separate JIRA?

          Kyle

          Show
          Kyle Leckie added a comment - Thanks for your thorough response Kai, 1) I agree on having support for tokens with pluggable token validation. Having the token contain an audience property in order to limit its scope should not add significant overhead but I take your point about having an initial implementation and progressing from there on an as needed basis. 2)3) Thanks for the clarification. It seems that supporting pluggable token validation is a significant feature in itself and the TAS work can be layered on top. What do you think of having the token validation and transmission as a separate JIRA? – Kyle
          Hide
          Andrew Purtell added a comment -

          When is the meetup?

          Show
          Andrew Purtell added a comment - When is the meetup?
          Hide
          Kevin Minder added a comment -

          I'm happy to announce that we have secured a time slot and dedicated space during Hadoop Summit NA dedicated to forward looking Hadoop security design collaboration. Currently, a room has been allocated on the 26th from 1:45 to 3:30 PT. Specific location will be available at the Summit and any changes in date or time will be announced publicly to the best of our abilities. In order to create a manageable agenda for this session, I'd like to schedule some prep meetings via meetup.com to start discussions and preparations with those that would be interested in co-organizing the session.

          Show
          Kevin Minder added a comment - I'm happy to announce that we have secured a time slot and dedicated space during Hadoop Summit NA dedicated to forward looking Hadoop security design collaboration. Currently, a room has been allocated on the 26th from 1:45 to 3:30 PT. Specific location will be available at the Summit and any changes in date or time will be announced publicly to the best of our abilities. In order to create a manageable agenda for this session, I'd like to schedule some prep meetings via meetup.com to start discussions and preparations with those that would be interested in co-organizing the session.
          Hide
          Kevin Minder added a comment -

          Logistics for remote attendance will also be announce publicly when we have that figured out. We won't be making any decisions about security at either any prep or the Summit sessions and detailed summaries will be provided here for those that cannot attend.

          Show
          Kevin Minder added a comment - Logistics for remote attendance will also be announce publicly when we have that figured out. We won't be making any decisions about security at either any prep or the Summit sessions and detailed summaries will be provided here for those that cannot attend.
          Hide
          Sanjay Radia added a comment -

          Thanks for the Jira and the slides on what you are proposing.

          There is also no consistent delegation model. HDFS has a simple delegation capability, and only Oozie can take limited advantage of it. We will implement a common token based authentication framework to decouple internal user and service authentication from external mechanisms used to support it (like Kerberos)

          I am puzzled by the above statement. Hadoop has delegation tokens and a trust model. For the largest part we use delegation tokens (e.g. MR job client gets HDFS delegation tokens etc) so that the job can run as the user that submitted the job. Further in some cases we use trusted proxies like Ozzie (but this can be any trusted service, not just Oozie), to access system services as specific users. The delegation tokens and the trusted proxies are two independent mechanisms. So I feel the statements in the quoted block are not correct or perhaps you are using the term "delegation" in a different sense. Details of the Hadoop delegation tokens are in the following very detailed paper on Hadoop security (see http://hortonworks.com/wp-content/uploads/2011/10/security-design_withCover-1.pdf).

          You also state "We will implement a common token based authentication framework to decouple internal user and service authentication from external mechanisms used to support it (like Kerberos)" Note the at internal Hadoop tokens are separate from the Kerberos tokens - indeed they are nicely decoupled - the problem, IMHO, is not the decoupling but other issues. Another issue is the authentication implementation which has a burnt-in notion of supporting UGI, Keberos or the delegation tokens for authentication. As Daryn pointed out this implementation needs to change so that the authentication mechanism is pluggable. This jira, I believe, is proposing much much more than making this implementation pluggable.

          Don't get me wrong, I am not criticizing the Jira but merely trying to understand some of the statements in the description and the slides you have posted. I do agree that we need to allow other forms of authentication besides Kerberos/ActiveDir. I also agree that attributes like group membership should have been part of the Hadoop delegation tokens to avoid back calls which are not feasible in cloud environments. Do you have a more detailed design besides the slides that you have uploaded to this jira? I would like to get the next level of details. Your comments in this jira do give some more details but it would be good to put them in a design document. Further, I suspect you are trying to replace the Hadoop delegation tokens - i don't disagree with that but would like to understand the why and how from your perspective.

          Would this be an accurate description of this Jira: "Single sign on for non-kerberos environments using tokens". Hadoop does support "single sign on" for kerberos/activeDir environments; of course that is not good enough since many customers do not have Kerberos/ActiveDir.

          Show
          Sanjay Radia added a comment - Thanks for the Jira and the slides on what you are proposing. There is also no consistent delegation model. HDFS has a simple delegation capability, and only Oozie can take limited advantage of it. We will implement a common token based authentication framework to decouple internal user and service authentication from external mechanisms used to support it (like Kerberos) I am puzzled by the above statement. Hadoop has delegation tokens and a trust model. For the largest part we use delegation tokens (e.g. MR job client gets HDFS delegation tokens etc) so that the job can run as the user that submitted the job. Further in some cases we use trusted proxies like Ozzie (but this can be any trusted service, not just Oozie), to access system services as specific users. The delegation tokens and the trusted proxies are two independent mechanisms. So I feel the statements in the quoted block are not correct or perhaps you are using the term "delegation" in a different sense. Details of the Hadoop delegation tokens are in the following very detailed paper on Hadoop security (see http://hortonworks.com/wp-content/uploads/2011/10/security-design_withCover-1.pdf ). You also state "We will implement a common token based authentication framework to decouple internal user and service authentication from external mechanisms used to support it (like Kerberos)" Note the at internal Hadoop tokens are separate from the Kerberos tokens - indeed they are nicely decoupled - the problem, IMHO, is not the decoupling but other issues. Another issue is the authentication implementation which has a burnt-in notion of supporting UGI, Keberos or the delegation tokens for authentication. As Daryn pointed out this implementation needs to change so that the authentication mechanism is pluggable. This jira, I believe, is proposing much much more than making this implementation pluggable. Don't get me wrong, I am not criticizing the Jira but merely trying to understand some of the statements in the description and the slides you have posted. I do agree that we need to allow other forms of authentication besides Kerberos/ActiveDir. I also agree that attributes like group membership should have been part of the Hadoop delegation tokens to avoid back calls which are not feasible in cloud environments. Do you have a more detailed design besides the slides that you have uploaded to this jira? I would like to get the next level of details. Your comments in this jira do give some more details but it would be good to put them in a design document. Further, I suspect you are trying to replace the Hadoop delegation tokens - i don't disagree with that but would like to understand the why and how from your perspective. Would this be an accurate description of this Jira: "Single sign on for non-kerberos environments using tokens". Hadoop does support "single sign on" for kerberos/activeDir environments; of course that is not good enough since many customers do not have Kerberos/ActiveDir.
          Hide
          Kai Zheng added a comment -

          Kyle- thank you for thinking about this. I’m glad you like the pluggable token authenticator/validation. Yes we also need to consider token transmission as you mentioned to protect it from being leaked. It’s related to current Hadoop RPC/SASL and we need to seek a way or SASL mechanism to implement the TokenAuthn method in current Hadoop security framework. It involves significant work and does serve to be addressed in a separate JIRA. I will open a subtask JIRA accordingly and let you know. Then we can discuss about this there.

          Show
          Kai Zheng added a comment - Kyle- thank you for thinking about this. I’m glad you like the pluggable token authenticator/validation. Yes we also need to consider token transmission as you mentioned to protect it from being leaked. It’s related to current Hadoop RPC/SASL and we need to seek a way or SASL mechanism to implement the TokenAuthn method in current Hadoop security framework. It involves significant work and does serve to be addressed in a separate JIRA. I will open a subtask JIRA accordingly and let you know. Then we can discuss about this there.
          Hide
          Daryn Sharp added a comment -

          After HADOOP-9421, and perhaps a few followup changes, new SASL auth methods will become much easier.

          Show
          Daryn Sharp added a comment - After HADOOP-9421 , and perhaps a few followup changes, new SASL auth methods will become much easier.
          Hide
          Kai Zheng added a comment -

          Daryn- Sorry for late response, your comments are great and very welcome.

          Identity token is issued by TAS when client authentication is passed, and TAS is trusted by Hadoop services. Token needs to authenticate to service and pluggable client token authenticator/validator is allowed. The authenticator can be configured per service according to service specific security policies to reject invalid tokens. As discussed with Kyle we are considering an Access Token with audience restriction annotations. Sure token should be protected avoiding to be leaked and used by other client/user and we’ll discuss about this separately.

          As above mentioned, TAS along with its issued tokens is trusted by Hadoop services/servers. The token with its attributes is encrypted and signed. As to which attributes should be contained in identity token, we can discuss about it separately. However, I don’t think group is something special, if we employ fine-grained access control against other attributes like role, they should be important too. Identity attributes can come from various Attribute Authorities in the enterprise outside of the Hadoop cluster. Most importantly we desire to abstract all of this from Hadoop into our proposed frameworks to simplify the configuration, deployment, and administration of large or multiple Hadoop clusters.

          Based on TokenAuth framework, we’re about to support Kerberos mechanism by KerberosTokenAuthnModule as mentioned in the doc, and the module can be used to authenticate TAS client via Kerberos. In this case TAS client needs to pass Kerberos authentication first via kinit or keytab, then authenticates to the authentication module as accessing a service via service ticket, and finally gets an identity token. The mentioned callback for principal instead of ticket might not be used.

          Client identity token wraps identity attributes from user, and service identity token wraps service attributes and security policies specific to the service. As default implementation, token realm trusting based authenticator is used to validate client token using service’s token. As discussed with Kyle, custom token validator can be plugined per service to employ advanced validation mechanisms. Note we are considering Access Token and when it’s used this validating of client token against service token might not be applied and the token validator can be simplified.

          Totally agree with that we/Hadoop should simplify the security configuration and deployment for Hadoop. In TokenAuth deployment, Hadoop only needs to be aware of TAS, without bothering to understand and configure concrete authentication providers. I agree we support multiple clusters, so let’s see how we can provide best support so that TAS can be layered for that. Regarding concrete configuration properties as simple as possible I would like to discuss then separately.

          Show
          Kai Zheng added a comment - Daryn- Sorry for late response, your comments are great and very welcome. Identity token is issued by TAS when client authentication is passed, and TAS is trusted by Hadoop services. Token needs to authenticate to service and pluggable client token authenticator/validator is allowed. The authenticator can be configured per service according to service specific security policies to reject invalid tokens. As discussed with Kyle we are considering an Access Token with audience restriction annotations. Sure token should be protected avoiding to be leaked and used by other client/user and we’ll discuss about this separately. As above mentioned, TAS along with its issued tokens is trusted by Hadoop services/servers. The token with its attributes is encrypted and signed. As to which attributes should be contained in identity token, we can discuss about it separately. However, I don’t think group is something special, if we employ fine-grained access control against other attributes like role, they should be important too. Identity attributes can come from various Attribute Authorities in the enterprise outside of the Hadoop cluster. Most importantly we desire to abstract all of this from Hadoop into our proposed frameworks to simplify the configuration, deployment, and administration of large or multiple Hadoop clusters. Based on TokenAuth framework, we’re about to support Kerberos mechanism by KerberosTokenAuthnModule as mentioned in the doc, and the module can be used to authenticate TAS client via Kerberos. In this case TAS client needs to pass Kerberos authentication first via kinit or keytab, then authenticates to the authentication module as accessing a service via service ticket, and finally gets an identity token. The mentioned callback for principal instead of ticket might not be used. Client identity token wraps identity attributes from user, and service identity token wraps service attributes and security policies specific to the service. As default implementation, token realm trusting based authenticator is used to validate client token using service’s token. As discussed with Kyle, custom token validator can be plugined per service to employ advanced validation mechanisms. Note we are considering Access Token and when it’s used this validating of client token against service token might not be applied and the token validator can be simplified. Totally agree with that we/Hadoop should simplify the security configuration and deployment for Hadoop. In TokenAuth deployment, Hadoop only needs to be aware of TAS, without bothering to understand and configure concrete authentication providers. I agree we support multiple clusters, so let’s see how we can provide best support so that TAS can be layered for that. Regarding concrete configuration properties as simple as possible I would like to discuss then separately.
          Hide
          Sanjay Radia added a comment -

          I suspect you are trying to replace the Hadoop delegation tokens.

          Kai Assuming that you are planning to replace Hadoop delegation tokens, would this be done in 2 phases where in phase 1 we would use the delegation tokens as is and simply use TAS for authentication? The same question for the block access token which is really more of a capability rather then a authentication token. This is important to know because it will help decide on whether or not to do improvements to the existing delegation tokens that were planned.

          Show
          Sanjay Radia added a comment - I suspect you are trying to replace the Hadoop delegation tokens. Kai Assuming that you are planning to replace Hadoop delegation tokens, would this be done in 2 phases where in phase 1 we would use the delegation tokens as is and simply use TAS for authentication? The same question for the block access token which is really more of a capability rather then a authentication token. This is important to know because it will help decide on whether or not to do improvements to the existing delegation tokens that were planned.
          Hide
          Kai Zheng added a comment -

          Hi Sanjay, thanks for your comments.

          You’re right we are using the term “delegation” in a different, more generic way. Hadoop has delegation tokens for HDFS access that can be transmitted through to MR jobs. We are talking about delegating authentication and authorization in a pluggable way throughout the entire ecosystem. What we meant by inconsistent is the ecosystem coverage for delegation, it can’t be done everywhere, Hadoop delegation today is HDFS centric.

          We did not mean to imply that Hadoop had no decoupling, instead we mean our framework will have this trait. Yes you’re right we imply other issues, and you might agree that the implementation of UGI should change so authentication mechanisms can plug in more easily. In my understanding, Daryn might be working on those issues related to allow plugin authentication mechanisms but in the current way. As you said this jira proposes much more than this and it targets to support plugin authentication mechanisms in TAS via a TokenAuthn method in current framework based on token, so that Hadoop ecosystem components only needs to talk to the token without understanding or involving concrete authentication mechanisms.

          Regarding the jira description thanks for your suggestion. It’s not the whole story, we mean “Single Sign On for Kerberos or Non-Kerberos environments using tokens”. We want to extend what Hadoop can do today with Kerberos to encompass additional authenticators and identify providers.

          We don’t mean to replace current Hadoop tokens (delegation token, block token, job token and etc). In my view, they’re internal tokens, TokenAuth token is more like UGI equivalent, so we believe the new token can coexist with the old tokens, as the doc mentions and also discussed previously with Thomas. I agree we might have two phases, in phase 1 we introduce TAS as authentication to external system, trying not to change internal tokens. And in phase 2 we might improve those tokens or have better support for such tokens utilizing the new authn & authz framework if we find such possibilities or space.

          Show
          Kai Zheng added a comment - Hi Sanjay, thanks for your comments. You’re right we are using the term “delegation” in a different, more generic way. Hadoop has delegation tokens for HDFS access that can be transmitted through to MR jobs. We are talking about delegating authentication and authorization in a pluggable way throughout the entire ecosystem. What we meant by inconsistent is the ecosystem coverage for delegation, it can’t be done everywhere, Hadoop delegation today is HDFS centric. We did not mean to imply that Hadoop had no decoupling, instead we mean our framework will have this trait. Yes you’re right we imply other issues, and you might agree that the implementation of UGI should change so authentication mechanisms can plug in more easily. In my understanding, Daryn might be working on those issues related to allow plugin authentication mechanisms but in the current way. As you said this jira proposes much more than this and it targets to support plugin authentication mechanisms in TAS via a TokenAuthn method in current framework based on token, so that Hadoop ecosystem components only needs to talk to the token without understanding or involving concrete authentication mechanisms. Regarding the jira description thanks for your suggestion. It’s not the whole story, we mean “Single Sign On for Kerberos or Non-Kerberos environments using tokens”. We want to extend what Hadoop can do today with Kerberos to encompass additional authenticators and identify providers. We don’t mean to replace current Hadoop tokens (delegation token, block token, job token and etc). In my view, they’re internal tokens, TokenAuth token is more like UGI equivalent, so we believe the new token can coexist with the old tokens, as the doc mentions and also discussed previously with Thomas. I agree we might have two phases, in phase 1 we introduce TAS as authentication to external system, trying not to change internal tokens. And in phase 2 we might improve those tokens or have better support for such tokens utilizing the new authn & authz framework if we find such possibilities or space.
          Hide
          Daryn Sharp added a comment -

          What we meant by inconsistent is the ecosystem coverage for delegation, it can’t be done everywhere, Hadoop delegation today is HDFS centric.

          That is not true. Delegation tokens are embedded at the RPC layer, so it's a capability that any service using the common RPC may use. YARN extensively uses the same delegation token framework, and MR uses it for the history server.

          Yes you’re right we imply other issues, and you might agree that the implementation of UGI should change so authentication mechanisms can plug in more easily. In my understanding, Daryn might be working on those issues related to allow plugin authentication mechanisms but in the current way.

          While the current design does have flaws, it can be adapted and improved in an incremental fashion. In a nutshell, after my changes: the client and server will be able to negotiate the authentication method, and use the mechanism specified by the server for that method. The client will essentially start to "just do what it's told" by the server. Eventually the UGI should be able to trigger an on-demand JAAS login for a given auth type, versus today's automatic login at instantiation. There's little reason why the UGI's auth type mapping cannot become dynamic perhaps via a service loader. I think this meshes with your goals.

          My high level goals are to provide better extensibility with minimal disruptive behavior.

          The original driver of my work is to allow security & tokens to be enabled w/o kerberos. This requires pluggable auth methods, in my case, adding support for SASL PLAIN as a substitute for SIMPLE. Providing security sans kerberos will allow developers and/or pre-commit to catch token related bugs so people like me don't have to chase so many of them. You can then plug in your own auth types, such as ldap, new tokens, etc.

          Another driver is adding SASL auth method negotiation to compliment support for multiple auth methods, and extending this negotiation to enable new capabilities. Ex. heterogenous security clusters/services, supporting multi-interface servers which indirectly allows HA to use IP failover, etc.

          Given all the discussions involving more radical changes to the security framework, I'm very keen to providing the modularity required to implement these systems, but in a manner that will not destabilize the existing security implementation, else Yahoo's 2.x deployments may be delayed.

          Show
          Daryn Sharp added a comment - What we meant by inconsistent is the ecosystem coverage for delegation, it can’t be done everywhere, Hadoop delegation today is HDFS centric. That is not true. Delegation tokens are embedded at the RPC layer, so it's a capability that any service using the common RPC may use. YARN extensively uses the same delegation token framework, and MR uses it for the history server. Yes you’re right we imply other issues, and you might agree that the implementation of UGI should change so authentication mechanisms can plug in more easily. In my understanding, Daryn might be working on those issues related to allow plugin authentication mechanisms but in the current way. While the current design does have flaws, it can be adapted and improved in an incremental fashion. In a nutshell, after my changes: the client and server will be able to negotiate the authentication method, and use the mechanism specified by the server for that method. The client will essentially start to "just do what it's told" by the server. Eventually the UGI should be able to trigger an on-demand JAAS login for a given auth type, versus today's automatic login at instantiation. There's little reason why the UGI's auth type mapping cannot become dynamic perhaps via a service loader. I think this meshes with your goals. My high level goals are to provide better extensibility with minimal disruptive behavior. The original driver of my work is to allow security & tokens to be enabled w/o kerberos. This requires pluggable auth methods, in my case, adding support for SASL PLAIN as a substitute for SIMPLE. Providing security sans kerberos will allow developers and/or pre-commit to catch token related bugs so people like me don't have to chase so many of them. You can then plug in your own auth types, such as ldap, new tokens, etc. Another driver is adding SASL auth method negotiation to compliment support for multiple auth methods, and extending this negotiation to enable new capabilities. Ex. heterogenous security clusters/services, supporting multi-interface servers which indirectly allows HA to use IP failover, etc. Given all the discussions involving more radical changes to the security framework, I'm very keen to providing the modularity required to implement these systems, but in a manner that will not destabilize the existing security implementation , else Yahoo's 2.x deployments may be delayed.
          Hide
          Kevin Minder added a comment -

          Well said Daryn. Aligning this area of work across all interested parties is critical. We need to be able to clearly articulate the goals of the effort and then understand how we can all work together to accomplish them without duplicate, conflicting work and destabilizing Hadoop. In the coming days I hope to put some structure around this with the goal of having a meaningful conversation about this at Hadoop Summit. The good thing is that at least from my perspective it seems that our goals are aligned. Specifically we would like Hadoop to support pluggable mechanisms for the authentication of both users and services. We all have different ideas and are approaching this from different angles. We need to figure out how all the puzzle pieces fit together.

          Show
          Kevin Minder added a comment - Well said Daryn. Aligning this area of work across all interested parties is critical. We need to be able to clearly articulate the goals of the effort and then understand how we can all work together to accomplish them without duplicate, conflicting work and destabilizing Hadoop. In the coming days I hope to put some structure around this with the goal of having a meaningful conversation about this at Hadoop Summit. The good thing is that at least from my perspective it seems that our goals are aligned. Specifically we would like Hadoop to support pluggable mechanisms for the authentication of both users and services. We all have different ideas and are approaching this from different angles. We need to figure out how all the puzzle pieces fit together.
          Hide
          Andrew Purtell added a comment -

          Currently, a room has been allocated on the 26th from 1:45 to 3:30 PT. Specific location will be available at the Summit and any changes in date or time will be announced publicly to the best of our abilities. In order to create a manageable agenda for this session, I'd like to schedule some prep meetings via meetup.com.

          Kevin Minder Is there a link to that meetup group?

          Show
          Andrew Purtell added a comment - Currently, a room has been allocated on the 26th from 1:45 to 3:30 PT. Specific location will be available at the Summit and any changes in date or time will be announced publicly to the best of our abilities. In order to create a manageable agenda for this session, I'd like to schedule some prep meetings via meetup.com. Kevin Minder Is there a link to that meetup group?
          Hide
          Kai Zheng added a comment -

          Daryn –

          Delegation tokens are embedded at the RPC layer, so it's a capability that any service using the common RPC may use.

          Thanks for the clarification. Yes that part was misspoken. The term ‘delegation’ is being overloaded here. The relevant fact is delegation can be done only where Hadoop RPC is used. We will update the document to be more clear about issues of delegation.

          Given all the discussions involving more radical changes to the security framework, I'm very keen to providing the modularity required to implement these systems, but in a manner that will not destabilize the existing security implementation, else Yahoo's 2.x deployments may be delayed.

          Agreed. The proposal here implements Hadoop side changes using SASL and Hadoop RPC of today as a starting point, with a requirement that the end result remains backwards compatible and interoperable with existing deployments.

          Kevin –

          Aligning this area of work across all interested parties is critical. We need to be able to clearly articulate the goals of the effort and then understand how we can all work together to accomplish them without duplicate, conflicting work and destabilizing Hadoop. […] We all have different ideas and are approaching this from different angles. We need to figure out how all the puzzle pieces fit together.

          This is exactly what we hoped opening this JIRA would spark and would like very much for the whole community of interested parties to work in a cooperative way. In addition to putting up an agenda for the summit meetup to bring some structure, bringing all related discussion under the umbrella of this JIRA would perhaps be helpful in having everyone working together.

          Show
          Kai Zheng added a comment - Daryn – Delegation tokens are embedded at the RPC layer, so it's a capability that any service using the common RPC may use. Thanks for the clarification. Yes that part was misspoken. The term ‘delegation’ is being overloaded here. The relevant fact is delegation can be done only where Hadoop RPC is used. We will update the document to be more clear about issues of delegation. Given all the discussions involving more radical changes to the security framework, I'm very keen to providing the modularity required to implement these systems, but in a manner that will not destabilize the existing security implementation, else Yahoo's 2.x deployments may be delayed. Agreed. The proposal here implements Hadoop side changes using SASL and Hadoop RPC of today as a starting point, with a requirement that the end result remains backwards compatible and interoperable with existing deployments. Kevin – Aligning this area of work across all interested parties is critical. We need to be able to clearly articulate the goals of the effort and then understand how we can all work together to accomplish them without duplicate, conflicting work and destabilizing Hadoop. […] We all have different ideas and are approaching this from different angles. We need to figure out how all the puzzle pieces fit together. This is exactly what we hoped opening this JIRA would spark and would like very much for the whole community of interested parties to work in a cooperative way. In addition to putting up an agenda for the summit meetup to bring some structure, bringing all related discussion under the umbrella of this JIRA would perhaps be helpful in having everyone working together.
          Hide
          Kevin Minder added a comment -

          Although meetup.com was recommended to me as a mechanism to schedule a discussion, that doesn't really seem like it will work since this needs to be a virtual. I've schedule a Google Hangout for 12pmPT on Wednesday 6/12. https://plus.google.com/hangouts/_/calendar/a2V2aW4ubWluZGVyQGhvcnRvbndvcmtzLmNvbQ.qa0og2a0gaag9djeviv2rai63c
          I'm happy to move this around based on availability of those interested. I'm just not sure of the timezones involved. You can email my apache account (kminder at apache) or my jira profile address if you don't want that info here.

          At any rate for this "pre-meeting", I'd like to discuss what everyone would like to get out of the our time at the Summit and how we can prepare in advance. To seed this I think there are a few things we need to nail down before we get there.
          1) The scope of the discussion
          2) The basic goals/requirements from various perspectives
          3) Agreement on the design discussion logistics (we only have two hours)

          At Summit we can:
          1) Discuss design approaches. I want to stress that these discussions need to be at a fairly high level given the time allocation. Ideally we would have been able to cover this already here but we are rapidly running out of time.
          2) Discuss a general implementation approach for any change of this nature
          3) Discuss rollout expectations (e.g. Hadoop ?.?)

          Show
          Kevin Minder added a comment - Although meetup.com was recommended to me as a mechanism to schedule a discussion, that doesn't really seem like it will work since this needs to be a virtual. I've schedule a Google Hangout for 12pmPT on Wednesday 6/12. https://plus.google.com/hangouts/_/calendar/a2V2aW4ubWluZGVyQGhvcnRvbndvcmtzLmNvbQ.qa0og2a0gaag9djeviv2rai63c I'm happy to move this around based on availability of those interested. I'm just not sure of the timezones involved. You can email my apache account (kminder at apache) or my jira profile address if you don't want that info here. At any rate for this "pre-meeting", I'd like to discuss what everyone would like to get out of the our time at the Summit and how we can prepare in advance. To seed this I think there are a few things we need to nail down before we get there. 1) The scope of the discussion 2) The basic goals/requirements from various perspectives 3) Agreement on the design discussion logistics (we only have two hours) At Summit we can: 1) Discuss design approaches. I want to stress that these discussions need to be at a fairly high level given the time allocation. Ideally we would have been able to cover this already here but we are rapidly running out of time. 2) Discuss a general implementation approach for any change of this nature 3) Discuss rollout expectations (e.g. Hadoop ?.?)
          Hide
          Kevin Minder added a comment -

          I also added this gho for the meeting today here http://gphangouts.com/google/hangout/general/109294359812907561436/

          Show
          Kevin Minder added a comment - I also added this gho for the meeting today here http://gphangouts.com/google/hangout/general/109294359812907561436/
          Hide
          Larry McCay added a comment -

          A thank you to those that attended the prep-call yesterday for the summit security session. While not all interested parties were able to make it to this call, we were able to lay some groundwork for moving forward in being prepared. We intend to schedule another call for next week at a more globally appropriate time. In the mean time, the following is a summary of the call from yesterday and should be used to frame the agenda for the next call.

          Prep-call Summary

          Introductions

          Community driven collaboration examples

          • HDFS-HA as a successful model
          • break out concrete areas that can be worked on by different parties but are aligned and complimentary
          • HDFS-HA apparently did this between at least two contributing parties with functionality separated into things like:
            a. client failover/recovery
            b. transaction journalling to support the recovery

          Roadmap to prepare for summit:

          • Describe overall end-state goals for the Hadoop Security Model for Authentication (keep the scope focused on authn)
          • Canonical security concerns and threats for an authentication system that is an alternative to kerberos
          • Describe the various tasks/projects that are required for reaching our goals
          • reconcile existing Jiras as subtasks of others as appropriate

          Ideally at summit we will be able to focus on:

          • Identify a phased approach to reaching our goals
          • Identify the best form of collaboration model for the effort
          • Identify natural seams of separation for collaboration
          • Interested contributors commit to specific aspects of the effort
          Show
          Larry McCay added a comment - A thank you to those that attended the prep-call yesterday for the summit security session. While not all interested parties were able to make it to this call, we were able to lay some groundwork for moving forward in being prepared. We intend to schedule another call for next week at a more globally appropriate time. In the mean time, the following is a summary of the call from yesterday and should be used to frame the agenda for the next call. Prep-call Summary Introductions Community driven collaboration examples HDFS-HA as a successful model break out concrete areas that can be worked on by different parties but are aligned and complimentary HDFS-HA apparently did this between at least two contributing parties with functionality separated into things like: a. client failover/recovery b. transaction journalling to support the recovery Roadmap to prepare for summit: Describe overall end-state goals for the Hadoop Security Model for Authentication (keep the scope focused on authn) Canonical security concerns and threats for an authentication system that is an alternative to kerberos add as document or subtask of https://issues.apache.org/jira/browse/HADOOP-9621 Describe the various tasks/projects that are required for reaching our goals reconcile existing Jiras as subtasks of others as appropriate Ideally at summit we will be able to focus on: Identify a phased approach to reaching our goals Identify the best form of collaboration model for the effort Identify natural seams of separation for collaboration Interested contributors commit to specific aspects of the effort
          Hide
          Larry McCay added a comment -

          As was pointed out on 9533 - the summary above is merely a description of what was discussed on the call for preparing for the security session at summit. No decisions have been made and we can/should discuss what the next call agenda should be. All decisions will be made through public communication such as these Jiras or dev-common list. Sorry if that wasn't clear in the above summary post.

          Show
          Larry McCay added a comment - As was pointed out on 9533 - the summary above is merely a description of what was discussed on the call for preparing for the security session at summit. No decisions have been made and we can/should discuss what the next call agenda should be. All decisions will be made through public communication such as these Jiras or dev-common list. Sorry if that wasn't clear in the above summary post.
          Hide
          Kevin Minder added a comment -

          I'd like to provide another opportunity for anyone interested to discuss and prepare for the DesignLounge @ HadoopSummit session on security. I'll have a WebEx running at 5pmPT/8pmET/8amCT. As before this will just be a discussion (no decisions) and we will summarize here following the meeting. Here is the proposed agenda.

          • Introductions
          • Summarize previous call
          • Discuss goals/agenda/logistics for security DesignLounge@HadoopSummit session
          • Plan required preparatory material for the session

          WebEx details
          -------------------------------------------------------
          Meeting information
          -------------------------------------------------------
          Topic: Hadoop Security
          Date: Wednesday, June 19, 2013
          Time: 5:00 pm, Pacific Daylight Time (San Francisco, GMT-07:00)
          Meeting Number: 625 489 526
          Meeting Password: HadoopSecurity

          -------------------------------------------------------
          To start or join the online meeting
          -------------------------------------------------------
          Go to https://hortonworks.webex.com/hortonworks/j.php?ED=256673687&UID=508554752&PW=NZDdjOTcyNzdi&RT=MiM0

          -------------------------------------------------------
          Audio conference information
          -------------------------------------------------------
          To receive a call back, provide your phone number when you join the meeting, or call the number below and enter the access code.
          Call-in toll-free number (US/Canada): 1-877-668-4493
          Call-in toll number (US/Canada): 1-650-479-3208
          Global call-in numbers: https://hortonworks.webex.com/hortonworks/globalcallin.php?serviceType=MC&ED=256673687&tollFree=1
          Toll-free dialing restrictions: http://www.webex.com/pdf/tollfree_restrictions.pdf

          Access code:625 489 526

          -------------------------------------------------------
          For assistance
          -------------------------------------------------------
          1. Go to https://hortonworks.webex.com/hortonworks/mc
          2. On the left navigation bar, click "Support".
          To add this meeting to your calendar program (for example Microsoft Outlook), click this link:
          https://hortonworks.webex.com/hortonworks/j.php?ED=256673687&UID=508554752&ICS=MS&LD=1&RD=2&ST=1&SHA2=AAAAAtYvvV8MU/6na1FmVxgxSUcpUBRMQ62CB-UdrJ15Wywo

          To check whether you have the appropriate players installed for UCF (Universal Communications Format) rich media files, go to https://hortonworks.webex.com/hortonworks/systemdiagnosis.php.

          http://www.webex.com

          CCM:+16504793208x625489526#

          IMPORTANT NOTICE: This WebEx service includes a feature that allows audio and any documents and other materials exchanged or viewed during the session to be recorded. You should inform all meeting attendees prior to recording if you intend to record the meeting. Please note that any such recordings may be subject to discovery in the event of litigation.

          Show
          Kevin Minder added a comment - I'd like to provide another opportunity for anyone interested to discuss and prepare for the DesignLounge @ HadoopSummit session on security. I'll have a WebEx running at 5pmPT/8pmET/8amCT. As before this will just be a discussion (no decisions) and we will summarize here following the meeting. Here is the proposed agenda. Introductions Summarize previous call Discuss goals/agenda/logistics for security DesignLounge@HadoopSummit session Plan required preparatory material for the session WebEx details ------------------------------------------------------- Meeting information ------------------------------------------------------- Topic: Hadoop Security Date: Wednesday, June 19, 2013 Time: 5:00 pm, Pacific Daylight Time (San Francisco, GMT-07:00) Meeting Number: 625 489 526 Meeting Password: HadoopSecurity ------------------------------------------------------- To start or join the online meeting ------------------------------------------------------- Go to https://hortonworks.webex.com/hortonworks/j.php?ED=256673687&UID=508554752&PW=NZDdjOTcyNzdi&RT=MiM0 ------------------------------------------------------- Audio conference information ------------------------------------------------------- To receive a call back, provide your phone number when you join the meeting, or call the number below and enter the access code. Call-in toll-free number (US/Canada): 1-877-668-4493 Call-in toll number (US/Canada): 1-650-479-3208 Global call-in numbers: https://hortonworks.webex.com/hortonworks/globalcallin.php?serviceType=MC&ED=256673687&tollFree=1 Toll-free dialing restrictions: http://www.webex.com/pdf/tollfree_restrictions.pdf Access code:625 489 526 ------------------------------------------------------- For assistance ------------------------------------------------------- 1. Go to https://hortonworks.webex.com/hortonworks/mc 2. On the left navigation bar, click "Support". To add this meeting to your calendar program (for example Microsoft Outlook), click this link: https://hortonworks.webex.com/hortonworks/j.php?ED=256673687&UID=508554752&ICS=MS&LD=1&RD=2&ST=1&SHA2=AAAAAtYvvV8MU/6na1FmVxgxSUcpUBRMQ62CB-UdrJ15Wywo To check whether you have the appropriate players installed for UCF (Universal Communications Format) rich media files, go to https://hortonworks.webex.com/hortonworks/systemdiagnosis.php . http://www.webex.com CCM:+16504793208x625489526# IMPORTANT NOTICE: This WebEx service includes a feature that allows audio and any documents and other materials exchanged or viewed during the session to be recorded. You should inform all meeting attendees prior to recording if you intend to record the meeting. Please note that any such recordings may be subject to discovery in the event of litigation.
          Hide
          Kevin Minder added a comment -

          Relevant security related docs attached to this HADOOP-9621

          Show
          Kevin Minder added a comment - Relevant security related docs attached to this HADOOP-9621
          Hide
          Kevin Minder added a comment -

          Here is a summary of the discussion we had during the above call.

          Attendees: Andrew Purtell, Brian Swan, Benoy Antong, Avik Dey, Kai Zheng, Kyle Leckie, LarryMcCay, Kevin Minder, Tianyou Li

          – Goals & Perspective –

          Hortonworks

          • Plug into any enterprise Idp infrastructure
          • Enhance Hadoop security model to better support perimeter security
          • Align client programming model for different Hadoop deployment models

          Microsoft

          • Support pluggable identity providers: ActiveDirectory, cloud and beyond
          • Enhance user isolation within Hadoop cluster

          Intel

          • Support token based authentication
          • Support fine grained authorization
          • Seamless identity delegation at every layer
          • Support single sign on: from user's desktop, between Hadoop cluster
          • Pluggable at every level
          • Provide a security "toolkit" that would be integrated across the ecosystem
          • Must be backward compatible
          • Must take both RPC and HTTP into account and should follow common model

          eBay

          • Integrate better with eBay SSO
          • Provide SSO integration at RPC layer

          – Summit Planning –

          • Think of Summit session as a "meet and greet" and "Kickoff" of cross cutting security community
          • Create a new Jira to collect high-level use cases, goals and usability
          • Use time at summit to approach design at a whiteboard from a "clean slate" perspective against those use cases and goals
          • Get a sense of how we can divide and conqueror problem space
          • Figure out how best to collaborate
          • Figure out how we can all get "hacking" on this ASAP

          – Ideas –

          • Foster a security community within the Hadoop community
          • Suggest creating a focused security-dev type community mailing list
          • Suggest creating a wiki area devoted to overall security efforts
          • Ideally Current independent designs will inform a collaborative design, pull in best of existing code to accelerate
          • Link the security doc Jira HADOOP-9621 to other related security Jiras

          – Questions –

          • What would central token authority (i.e. HSSO) provide beyond what the work that is already being done?
          • HADOOP-9479 (Benoy Antony)
          • HADOOP-8779 (Daryn Sharp)
          • How can HSSO and TAS work together? What is the relationship?
          Show
          Kevin Minder added a comment - Here is a summary of the discussion we had during the above call. Attendees: Andrew Purtell, Brian Swan, Benoy Antong, Avik Dey, Kai Zheng, Kyle Leckie, LarryMcCay, Kevin Minder, Tianyou Li – Goals & Perspective – Hortonworks Plug into any enterprise Idp infrastructure Enhance Hadoop security model to better support perimeter security Align client programming model for different Hadoop deployment models Microsoft Support pluggable identity providers: ActiveDirectory, cloud and beyond Enhance user isolation within Hadoop cluster Intel Support token based authentication Support fine grained authorization Seamless identity delegation at every layer Support single sign on: from user's desktop, between Hadoop cluster Pluggable at every level Provide a security "toolkit" that would be integrated across the ecosystem Must be backward compatible Must take both RPC and HTTP into account and should follow common model eBay Integrate better with eBay SSO Provide SSO integration at RPC layer – Summit Planning – Think of Summit session as a "meet and greet" and "Kickoff" of cross cutting security community Create a new Jira to collect high-level use cases, goals and usability Use time at summit to approach design at a whiteboard from a "clean slate" perspective against those use cases and goals Get a sense of how we can divide and conqueror problem space Figure out how best to collaborate Figure out how we can all get "hacking" on this ASAP – Ideas – Foster a security community within the Hadoop community Suggest creating a focused security-dev type community mailing list Suggest creating a wiki area devoted to overall security efforts Ideally Current independent designs will inform a collaborative design, pull in best of existing code to accelerate Link the security doc Jira HADOOP-9621 to other related security Jiras – Questions – What would central token authority (i.e. HSSO) provide beyond what the work that is already being done? HADOOP-9479 (Benoy Antony) HADOOP-8779 (Daryn Sharp) How can HSSO and TAS work together? What is the relationship?
          Hide
          Larry McCay added a comment -
          • Summit Summary -
            Last week at Hadoop Summit there was a room dedicated as the summit Design Lounge.
            This was a place where folks could get together and talk about design issues with other contributors with a simple flip-board and some beanbag chairs.
            We used this as an opportunity to bootstrap some discussions within common-dev for security related topics. I'd like to summarize the security session and takeaways here for everyone.

          This summary and set of takeaways are largely from memory.
          Please feel free to correct anything that is inaccurate or omitted.

          Pretty well attended - don't recall all the names but some of the companies represented:

          • Yahoo!
          • Microsoft
          • Hortonworks
          • Intel
          • eBay
          • Voltage Security
          • Flying Penguins
          • EMC
          • others...

          We set expectations as a meet and greet/project kickoff - project being the emerging security development community.
          Most folks were pretty engaged throughout the session.

          In order to keep the scope of conversations manageable we tried to remain focused on authentication and the ideas around SSO and tokens.

          We discussed kerberos as:
          1. major pain point and barrier to entry for some
          2. seemingly perfect for others
          a. obviously requiring backward compatibility

          It seemed to be consensus that:
          1. user authentication should be easily integrated with alternative enterprise identity solutions
          2. that service identity issues should not require thousands of service identities added to enterprise user repositories
          3. that customers should not be forced to install/deploy and manage a KDC for services - this implies a couple options:
          a. alternatives to kerberos for service identities
          b. hadoop KDC implementation - ie. ApacheDS?

          There was active discussion around:
          1. Hadoop SSO server
          a. acknowledgement of Hadoop SSO tokens as something that can be standardized for representing both the identity and authentication event data as well and access tokens representing a verifiable means for the authenticated identity to access resources or services
          b. a general understanding of Hadoop SSO as being an analogue and alternative for the kerberos KDC and the related tokens being analogous to TGTs and service tickets
          c. an agreement that there are interesting attributes about the authentication event that may be useful in cross cluster trust for SSO - such as a rating of authentication strength and number of factors, etc
          d. that existing Hadoop tokens - ie. delegation, job, block access - will all continue to work and that we are initially looking at alternatives to the KDC, TGTs and service tickets
          2. authentication mechanism discovery by clients - Daryn Sharp has done a bunch of work around this and our SSO solution may want to consider a similar mechanism for discovering trusted IDPs and service endpoints
          3. backward compatibility - kerberos shops need to just continue to work
          4. some insight into where/how folks believe that token based authentication can be accomplished within existing contracts - SASL/GSSAPI, REST, web ui
          5. what the establishment of a cross cutting concern community around security and what that means in terms of the Apache way - email lists, wiki, Jiras across projects, etc
          6. dependencies, rolling updates, patching and how it related to hadoop projects versus packaging
          7. collaboration road ahead

          A number of breakout discussions were had outside of the designated design lounge session as well.

          Takeaways for the immediate road ahead:
          1. common-dev may be sufficient to discuss security related topics
          a. many developers are already subscribed to it
          b. there is not that much traffic there anyway
          c. we can discuss a more security focused list if we like
          2. we will discuss the establishment of a wiki space for a holistic view of security model, patterns, approaches, etc
          3. we will begin discussion on common-dev in near-term for the following:
          a. discuss and agree on the high level moving parts required for our goals for authentication: SSO service, tokens, token validation handlers, credential management tools, etc
          b. discuss and agree on the natural seams across these moving parts and agree on collaboration by tackling various pieces in a divide and conquer approach
          c. more than likely - the first piece that will need some immediate discussion will be the shape and form of the tokens
          d. we will follow up or supplement discussions with POC code patches and/or specs attached to jiras

          Overall, design lounge was rather effective for what we wanted to do - which was to bootstrap discussions and collaboration within the community at large. As always, no specific decisions have been made during this session and we can discuss any or all of this within common-dev and on related jiras.

          Jiras related to the security development group and these discussions:

          Centralized SSO/Token Server https://issues.apache.org/jira/browse/HADOOP-9533
          Token based authentication and SSO https://issues.apache.org/jira/browse/HADOOP-9392
          Document/analyze current Hadoop security model https://issues.apache.org/jira/browse/HADOOP-9621
          Improve Hadoop security - Use cases https://issues.apache.org/jira/browse/HADOOP-9671

          Show
          Larry McCay added a comment - Summit Summary - Last week at Hadoop Summit there was a room dedicated as the summit Design Lounge. This was a place where folks could get together and talk about design issues with other contributors with a simple flip-board and some beanbag chairs. We used this as an opportunity to bootstrap some discussions within common-dev for security related topics. I'd like to summarize the security session and takeaways here for everyone. This summary and set of takeaways are largely from memory. Please feel free to correct anything that is inaccurate or omitted. Pretty well attended - don't recall all the names but some of the companies represented: Yahoo! Microsoft Hortonworks Intel eBay Voltage Security Flying Penguins EMC others... We set expectations as a meet and greet/project kickoff - project being the emerging security development community. Most folks were pretty engaged throughout the session. In order to keep the scope of conversations manageable we tried to remain focused on authentication and the ideas around SSO and tokens. We discussed kerberos as: 1. major pain point and barrier to entry for some 2. seemingly perfect for others a. obviously requiring backward compatibility It seemed to be consensus that: 1. user authentication should be easily integrated with alternative enterprise identity solutions 2. that service identity issues should not require thousands of service identities added to enterprise user repositories 3. that customers should not be forced to install/deploy and manage a KDC for services - this implies a couple options: a. alternatives to kerberos for service identities b. hadoop KDC implementation - ie. ApacheDS? There was active discussion around: 1. Hadoop SSO server a. acknowledgement of Hadoop SSO tokens as something that can be standardized for representing both the identity and authentication event data as well and access tokens representing a verifiable means for the authenticated identity to access resources or services b. a general understanding of Hadoop SSO as being an analogue and alternative for the kerberos KDC and the related tokens being analogous to TGTs and service tickets c. an agreement that there are interesting attributes about the authentication event that may be useful in cross cluster trust for SSO - such as a rating of authentication strength and number of factors, etc d. that existing Hadoop tokens - ie. delegation, job, block access - will all continue to work and that we are initially looking at alternatives to the KDC, TGTs and service tickets 2. authentication mechanism discovery by clients - Daryn Sharp has done a bunch of work around this and our SSO solution may want to consider a similar mechanism for discovering trusted IDPs and service endpoints 3. backward compatibility - kerberos shops need to just continue to work 4. some insight into where/how folks believe that token based authentication can be accomplished within existing contracts - SASL/GSSAPI, REST, web ui 5. what the establishment of a cross cutting concern community around security and what that means in terms of the Apache way - email lists, wiki, Jiras across projects, etc 6. dependencies, rolling updates, patching and how it related to hadoop projects versus packaging 7. collaboration road ahead A number of breakout discussions were had outside of the designated design lounge session as well. Takeaways for the immediate road ahead: 1. common-dev may be sufficient to discuss security related topics a. many developers are already subscribed to it b. there is not that much traffic there anyway c. we can discuss a more security focused list if we like 2. we will discuss the establishment of a wiki space for a holistic view of security model, patterns, approaches, etc 3. we will begin discussion on common-dev in near-term for the following: a. discuss and agree on the high level moving parts required for our goals for authentication: SSO service, tokens, token validation handlers, credential management tools, etc b. discuss and agree on the natural seams across these moving parts and agree on collaboration by tackling various pieces in a divide and conquer approach c. more than likely - the first piece that will need some immediate discussion will be the shape and form of the tokens d. we will follow up or supplement discussions with POC code patches and/or specs attached to jiras Overall, design lounge was rather effective for what we wanted to do - which was to bootstrap discussions and collaboration within the community at large. As always, no specific decisions have been made during this session and we can discuss any or all of this within common-dev and on related jiras. Jiras related to the security development group and these discussions: Centralized SSO/Token Server https://issues.apache.org/jira/browse/HADOOP-9533 Token based authentication and SSO https://issues.apache.org/jira/browse/HADOOP-9392 Document/analyze current Hadoop security model https://issues.apache.org/jira/browse/HADOOP-9621 Improve Hadoop security - Use cases https://issues.apache.org/jira/browse/HADOOP-9671
          Hide
          Larry McCay added a comment -

          Just realized that I failed to mention that Cloudera was also represented - sorry Aaron!

          Show
          Larry McCay added a comment - Just realized that I failed to mention that Cloudera was also represented - sorry Aaron!
          Hide
          Aaron T. Myers added a comment -

          No sweat. Tucu and I just figured we were part of the "others."

          Show
          Aaron T. Myers added a comment - No sweat. Tucu and I just figured we were part of the "others."
          Hide
          Kai Zheng added a comment -

          TokenAuth design updated.

          Show
          Kai Zheng added a comment - TokenAuth design updated.
          Hide
          Kai Zheng added a comment -

          We just updated TokenAuth design and please help review the new revision. This revision incorporates feedback and suggestions in related discussion from community, particularly from Microsoft and others attending the Security design lounge session at the Hadoop summit. Summary of the changes:
          1.Revised the approach to now use two tokens, Identity Token plus Access Token, particularly considering our authorization framework and compatibility with HSSO;
          2.Introduced Authorization Server (AS) from our authorization framework into the flow that issues access tokens for clients with identity tokens to access services;
          3.Refined proxy access token and the proxy/impersonation flow;
          4.Refined the browser web SSO flow regarding access to Hadoop web services;
          5.Added Hadoop RPC access flow regarding CLI clients accessing Hadoop services via RPC/SASL;
          6.Added client authentication integration flow to illustrate how desktop logins can be integrated into the authentication process to TAS to exchange identity token;
          7.Introduced fine grained access control flow from authorization framework, I have put it in appendices section for the reference;
          8.Added a detailed flow to illustrate Hadoop Simple authentication over TokenAuth, in the appendices section;
          9.Added secured task launcher in appendices considering possible solutions for Windows platform;
          10.Removed low level contents, and not so relevant parts into appendices section from the main body.

          Thanks for your comments and feedback.

          Show
          Kai Zheng added a comment - We just updated TokenAuth design and please help review the new revision. This revision incorporates feedback and suggestions in related discussion from community, particularly from Microsoft and others attending the Security design lounge session at the Hadoop summit. Summary of the changes: 1.Revised the approach to now use two tokens, Identity Token plus Access Token, particularly considering our authorization framework and compatibility with HSSO; 2.Introduced Authorization Server (AS) from our authorization framework into the flow that issues access tokens for clients with identity tokens to access services; 3.Refined proxy access token and the proxy/impersonation flow; 4.Refined the browser web SSO flow regarding access to Hadoop web services; 5.Added Hadoop RPC access flow regarding CLI clients accessing Hadoop services via RPC/SASL; 6.Added client authentication integration flow to illustrate how desktop logins can be integrated into the authentication process to TAS to exchange identity token; 7.Introduced fine grained access control flow from authorization framework, I have put it in appendices section for the reference; 8.Added a detailed flow to illustrate Hadoop Simple authentication over TokenAuth, in the appendices section; 9.Added secured task launcher in appendices considering possible solutions for Windows platform; 10.Removed low level contents, and not so relevant parts into appendices section from the main body. Thanks for your comments and feedback.
          Hide
          Larry McCay added a comment -

          Thanks for the update, Kai.
          This revision is much more concise than the previous ppt deck and has certainly incorporated ideas from the discussions that were started at summit.

          I would encourage you to bring your comments to the DISCUSS thread that was started on common-dev for identifying the moving pieces. Our next step will be determining the dependencies that need to be done up front and breaking up the remaining pieces for implementation into assignable subtasks.

          I will reserve comment on the details of this new paper for when detailed discussions for those subtasks begin. In general, I think what I read is pretty well aligned with what others are thinking.

          Show
          Larry McCay added a comment - Thanks for the update, Kai. This revision is much more concise than the previous ppt deck and has certainly incorporated ideas from the discussions that were started at summit. I would encourage you to bring your comments to the DISCUSS thread that was started on common-dev for identifying the moving pieces. Our next step will be determining the dependencies that need to be done up front and breaking up the remaining pieces for implementation into assignable subtasks. I will reserve comment on the details of this new paper for when detailed discussions for those subtasks begin. In general, I think what I read is pretty well aligned with what others are thinking.
          Hide
          Thomas NGUY added a comment -

          Thanks Kai, I'll try to take some time to catch up everything, Regards.

          Show
          Thomas NGUY added a comment - Thanks Kai, I'll try to take some time to catch up everything, Regards.
          Hide
          Kai Zheng added a comment -

          Sure thing Larry. A lot of these updates predate last week’s discussion at the summit. Fortunately the discussion at the summit was in line with our thinking on the required revisions from discussing with others in the community prior to the summit.

          The DISCUSS thread on common-dev is great for some approach discussions, thank you starting that. For the design discussions and implementation feedback, in keeping with how it’s done on most JIRAs in Apache, let's keep it focused on the JIRA. That way they are all in one place for everyone to read on the JIRA itself and folks don’t have to hunt around for emails later.

          I am looking forward to comments from everyone in the community. Request the community review this design rev and provide their comments on the JIRA. Thanks in advance.

          Show
          Kai Zheng added a comment - Sure thing Larry. A lot of these updates predate last week’s discussion at the summit. Fortunately the discussion at the summit was in line with our thinking on the required revisions from discussing with others in the community prior to the summit. The DISCUSS thread on common-dev is great for some approach discussions, thank you starting that. For the design discussions and implementation feedback, in keeping with how it’s done on most JIRAs in Apache, let's keep it focused on the JIRA. That way they are all in one place for everyone to read on the JIRA itself and folks don’t have to hunt around for emails later. I am looking forward to comments from everyone in the community. Request the community review this design rev and provide their comments on the JIRA. Thanks in advance.
          Hide
          Brian Swan added a comment -

          Thanks, Kai, for the updated design doc. I've spent some time reading it and have a few comments/questions:

          1. The new diagram (p. 3) that describes client/TAS/AS/IdP/Hadoop Services interaction shows a client providing credentials to TAS, which then provides the credentials to the IdP. From a security perspective, this seems like a bad idea. It defeats the purpose of having an IdP in the first place. Is this an oversight or by design?

          2. I'm not sure I understand why AS is necessary. It seems to complicate the design by adding an unnecessary authorization check - authorization can/should happen at individual Hadoop services based on token attributes. I think you have mentioned before that authorization (with AS in place) would happen at both places (some level of authz at AS and finer grained authz at services). Can you elaborate on what value that adds over doing authz at services only? And, can you provide an example of what authz checks would happen at each place? (Say I access NameNode. What authz checks are done at AS and what is done at the service?)

          3. I believe this has been mentioned before, but the scope of this document makes it very difficult to move forward with contributing code. It would be very helpful to understand how you envision breaking this down into work items that the community can pick up (I think this is what the DISCUSS thread on common-dev was attempting to do).

          To further my last point, from my perspective, one work item that fits into your design is that of adding token support to RPC endpoints. This is a work item that would add value for customers right away while still allowing flexibility in the rest of the design. This is something we would like to begin work on now (after consulting Daryn Sharp, since I understand he's been doing some work in this area). However, it's not clear to me (based on comments in the DISCUSS thread on common-dev) if you are already writing code for this. It would be unfortunate to duplicate work here. If you have something concrete to share, that would be great.

          Thanks.

          Show
          Brian Swan added a comment - Thanks, Kai, for the updated design doc. I've spent some time reading it and have a few comments/questions: 1. The new diagram (p. 3) that describes client/TAS/AS/IdP/Hadoop Services interaction shows a client providing credentials to TAS, which then provides the credentials to the IdP. From a security perspective, this seems like a bad idea. It defeats the purpose of having an IdP in the first place. Is this an oversight or by design? 2. I'm not sure I understand why AS is necessary. It seems to complicate the design by adding an unnecessary authorization check - authorization can/should happen at individual Hadoop services based on token attributes. I think you have mentioned before that authorization (with AS in place) would happen at both places (some level of authz at AS and finer grained authz at services). Can you elaborate on what value that adds over doing authz at services only? And, can you provide an example of what authz checks would happen at each place? (Say I access NameNode. What authz checks are done at AS and what is done at the service?) 3. I believe this has been mentioned before, but the scope of this document makes it very difficult to move forward with contributing code. It would be very helpful to understand how you envision breaking this down into work items that the community can pick up (I think this is what the DISCUSS thread on common-dev was attempting to do). To further my last point, from my perspective, one work item that fits into your design is that of adding token support to RPC endpoints. This is a work item that would add value for customers right away while still allowing flexibility in the rest of the design. This is something we would like to begin work on now (after consulting Daryn Sharp, since I understand he's been doing some work in this area). However, it's not clear to me (based on comments in the DISCUSS thread on common-dev) if you are already writing code for this. It would be unfortunate to duplicate work here. If you have something concrete to share, that would be great. Thanks.
          Hide
          James C. Wu added a comment -

          On page 9 of the new document, Hadoop web browser access, I am concerned about step 7. When the browser authenticates to the remote Idp, and forwards the results to the TAS for Hadoop identity token, how can TAS be sure that the authentication result sent by the browser is not faked?

          Show
          James C. Wu added a comment - On page 9 of the new document, Hadoop web browser access, I am concerned about step 7. When the browser authenticates to the remote Idp, and forwards the results to the TAS for Hadoop identity token, how can TAS be sure that the authentication result sent by the browser is not faked?
          Hide
          Tianyou Li added a comment -

          Hi Brian,

          Thanks for reviewing and providing feedback on the design. You have asked some good questions so let me try to add some more context on the design choices and why we made them. Hopefully this additional context will shed some clarity. Please feel free to ask if you still have questions or concerns.

          > 1. The new diagram (p. 3) that describes client/TAS/AS/IdP/Hadoop Services interaction shows a client providing credentials to TAS, which then provides the credentials to the IdP. From a security perspective, this seems like a bad idea. It defeats the purpose of having an IdP in the first place. Is this an oversight or by design?

          From client point of view, the TAS should be trusted by client for authentication, whether or not client credentials can be passed to TAS directly depends on the IdP’s capability and the deployment decisions etc. If IdP can generate a token and is federated with TAS, then the token can be used to authenticate with TAS to generate identity token in Hadoop cluster. If IdP does not have the capability of generate trusted token e.g. LDAP, then there can be several alternate solutions that depends on the deployment scenario.

          The first scenario is TAS and IdP are deployed in the same organization in the same network, TAS can access IdP directly, in this scenario credentials are passed to TAS securely (over ssl) and then TAS pass the credential to IdP like LDAP. The second scenario is TAS and IdP are deployed separately in different network, TAS cannot contact the IdP directly, for example LDAP server is resident inside of enterprise and TAS is deployed in the cloud, and client is trying to access cluster from enterprise. In this scenario, an agent trusted by client can be deployed to collect client credentials, pass them to LDAP (aka the IdP), and generate token to external TAS to complete the authentication process. This agent can be another TAS as well. The third scenario is similar to the second scenario but the only difference is client is trying to access cluster from public network for example cloud environment, but need to used enterprise LDAP as IdP. In this scenario, an agent (can be TAS) needs to be deployed as gateway on the enterprise side to collect credentials.

          In any of the above scenario, for an IdP without the capability to generate token as a result of the authentication, TAS can be the agent trusted by client to collect credentials for first mile authentication. As a result of above consideration, we draw the flow as it shows in page 3.

          > 2. I'm not sure I understand why AS is necessary. It seems to complicate the design by adding an unnecessary authorization check - authorization can/should happen at individual Hadoop services based on token attributes. I think you have mentioned before that authorization (with AS in place) would happen at both places (some level of authz at AS and finer grained authz at services). Can you elaborate on what value that adds over doing authz at services only? And, can you provide an example of what authz checks would happen at each place? (Say I access NameNode. What authz checks are done at AS and what is done at the service?)

          I would like to agree with you that authorization can be pushed into service side but having a centralized authorization has some advantages. For example: any authZ policy changes can be enforced immediately instead of waiting for the policy sync to each service. This also provides a centralized place for auditing client access. The centralized authZ acts much like the service level authZ except it’s centralized for reasons I just mentioned. (In the scenario you mentioned, if you went to access HDFS service, you need to have access token granted with authZ policy defined, once you have the access token you have access to the HDFS service but that does not mean you can access any file in HDFS, the file/directory level access control is done by HDFS itself.)

          > 3. I believe this has been mentioned before, but the scope of this document makes it very difficult to move forward with contributing code. It would be very helpful to understand how you envision breaking this down into work items that the community can pick up (I think this is what the DISCUSS thread on common-dev was attempting to do).

          This one I am trying to understand a little better. Please help me understand what you mean by “… scope of this document makes it very difficult to move forward with contributing code.”? If we were to breakdown the jira in to a number of sub-tasks based on the document would that be helpful?

          Regards.

          Show
          Tianyou Li added a comment - Hi Brian, Thanks for reviewing and providing feedback on the design. You have asked some good questions so let me try to add some more context on the design choices and why we made them. Hopefully this additional context will shed some clarity. Please feel free to ask if you still have questions or concerns. > 1. The new diagram (p. 3) that describes client/TAS/AS/IdP/Hadoop Services interaction shows a client providing credentials to TAS, which then provides the credentials to the IdP. From a security perspective, this seems like a bad idea. It defeats the purpose of having an IdP in the first place. Is this an oversight or by design? From client point of view, the TAS should be trusted by client for authentication, whether or not client credentials can be passed to TAS directly depends on the IdP’s capability and the deployment decisions etc. If IdP can generate a token and is federated with TAS, then the token can be used to authenticate with TAS to generate identity token in Hadoop cluster. If IdP does not have the capability of generate trusted token e.g. LDAP, then there can be several alternate solutions that depends on the deployment scenario. The first scenario is TAS and IdP are deployed in the same organization in the same network, TAS can access IdP directly, in this scenario credentials are passed to TAS securely (over ssl) and then TAS pass the credential to IdP like LDAP. The second scenario is TAS and IdP are deployed separately in different network, TAS cannot contact the IdP directly, for example LDAP server is resident inside of enterprise and TAS is deployed in the cloud, and client is trying to access cluster from enterprise. In this scenario, an agent trusted by client can be deployed to collect client credentials, pass them to LDAP (aka the IdP), and generate token to external TAS to complete the authentication process. This agent can be another TAS as well. The third scenario is similar to the second scenario but the only difference is client is trying to access cluster from public network for example cloud environment, but need to used enterprise LDAP as IdP. In this scenario, an agent (can be TAS) needs to be deployed as gateway on the enterprise side to collect credentials. In any of the above scenario, for an IdP without the capability to generate token as a result of the authentication, TAS can be the agent trusted by client to collect credentials for first mile authentication. As a result of above consideration, we draw the flow as it shows in page 3. > 2. I'm not sure I understand why AS is necessary. It seems to complicate the design by adding an unnecessary authorization check - authorization can/should happen at individual Hadoop services based on token attributes. I think you have mentioned before that authorization (with AS in place) would happen at both places (some level of authz at AS and finer grained authz at services). Can you elaborate on what value that adds over doing authz at services only? And, can you provide an example of what authz checks would happen at each place? (Say I access NameNode. What authz checks are done at AS and what is done at the service?) I would like to agree with you that authorization can be pushed into service side but having a centralized authorization has some advantages. For example: any authZ policy changes can be enforced immediately instead of waiting for the policy sync to each service. This also provides a centralized place for auditing client access. The centralized authZ acts much like the service level authZ except it’s centralized for reasons I just mentioned. (In the scenario you mentioned, if you went to access HDFS service, you need to have access token granted with authZ policy defined, once you have the access token you have access to the HDFS service but that does not mean you can access any file in HDFS, the file/directory level access control is done by HDFS itself.) > 3. I believe this has been mentioned before, but the scope of this document makes it very difficult to move forward with contributing code. It would be very helpful to understand how you envision breaking this down into work items that the community can pick up (I think this is what the DISCUSS thread on common-dev was attempting to do). This one I am trying to understand a little better. Please help me understand what you mean by “… scope of this document makes it very difficult to move forward with contributing code.”? If we were to breakdown the jira in to a number of sub-tasks based on the document would that be helpful? Regards.
          Hide
          Brian Swan added a comment -

          Hi Tianyou-

          Maybe I should have listed my last comment/question first, as it was the most important to me: One work item that fits into your design is that of adding token support to RPC endpoints. This is a work item that would add value for customers right away while still allowing flexibility in the rest of the design. This is something we would like to begin work on now (after consulting Daryn Sharp, since I understand he's been doing some work in this area). However, it's not clear to me (based on comments in the DISCUSS thread on common-dev) if you are already writing code for this. It would be unfortunate to duplicate work here. If you have something concrete to share, that would be great.

          Regarding a client passing credentials to TAS: It seems that you are saying that a client would not pass credentials to TAS in all scenarios. This is not reflected in the diagram. I also am not sure what you mean by "TAS should be trusted by client for authentication". Trusting with credentials violates basic security principles, which I would not see as an improvement in Hadoop security.

          IMHO, the best way to get to a common understanding of the details here is with code or with a much more narrowly-scoped discussion (which is what I was trying to say in my point #3). I do think that breaking things down into sub-tasks is a good idea - the DISCUSS thread on common-dev that I mentioned before has a great start to this (by component).

          Thanks.

          Show
          Brian Swan added a comment - Hi Tianyou- Maybe I should have listed my last comment/question first, as it was the most important to me: One work item that fits into your design is that of adding token support to RPC endpoints. This is a work item that would add value for customers right away while still allowing flexibility in the rest of the design. This is something we would like to begin work on now (after consulting Daryn Sharp, since I understand he's been doing some work in this area). However, it's not clear to me (based on comments in the DISCUSS thread on common-dev) if you are already writing code for this. It would be unfortunate to duplicate work here. If you have something concrete to share, that would be great. Regarding a client passing credentials to TAS: It seems that you are saying that a client would not pass credentials to TAS in all scenarios. This is not reflected in the diagram. I also am not sure what you mean by "TAS should be trusted by client for authentication". Trusting with credentials violates basic security principles, which I would not see as an improvement in Hadoop security. IMHO, the best way to get to a common understanding of the details here is with code or with a much more narrowly-scoped discussion (which is what I was trying to say in my point #3). I do think that breaking things down into sub-tasks is a good idea - the DISCUSS thread on common-dev that I mentioned before has a great start to this (by component). Thanks.
          Hide
          Tianyou Li added a comment -

          Hi James,

          Thanks for reviewing. For Web SSO flow, usually the IdP will issue a token which is signed to ensure data integrity. So the token issued by IdP as a result of IdP authentication cannot be modified because the signing key is a secret of IdP, other parties cannot get the signing key so the token cannot be modified.

          Moreover, once client is redirect to IdP for authentication, the client usually need to verify and accept server certificate as a step of trust for the IdP via SSL(https), in this way to ensure credentials client is providing are routed to trusted IdP via secured channel. TAS also need to verify the signature of the token issued by that IdP, this step will prove that token is exactly issued by the designate IdP and can be authenticated successfully with TAS.

          As mentioned above, TLS/SSL should be enabled to protect credentials transmission during authentication process with IdP, and mitigate with MITM attack. To further improve the client authN security, multi-factor such as additional OTP authentication can also be employed, this is one of our design goal but might not be explicitly mentioned.

          Regards.

          Show
          Tianyou Li added a comment - Hi James, Thanks for reviewing. For Web SSO flow, usually the IdP will issue a token which is signed to ensure data integrity. So the token issued by IdP as a result of IdP authentication cannot be modified because the signing key is a secret of IdP, other parties cannot get the signing key so the token cannot be modified. Moreover, once client is redirect to IdP for authentication, the client usually need to verify and accept server certificate as a step of trust for the IdP via SSL(https), in this way to ensure credentials client is providing are routed to trusted IdP via secured channel. TAS also need to verify the signature of the token issued by that IdP, this step will prove that token is exactly issued by the designate IdP and can be authenticated successfully with TAS. As mentioned above, TLS/SSL should be enabled to protect credentials transmission during authentication process with IdP, and mitigate with MITM attack. To further improve the client authN security, multi-factor such as additional OTP authentication can also be employed, this is one of our design goal but might not be explicitly mentioned. Regards.
          Hide
          Daryn Sharp added a comment -

          I'm still digesting the doc, but it's unclear to me if a client in this architecture will be able to simultaneously access both SSO clusters and non-SSO (ex. kerberos) clusters? A large scale deployment such as Yahoo will never adopt a new security framework if a staged migration will break cross-cluster access.

          I'm a bit concerned about wanting to migrate the existing auth methods into a new framework. I'd prefer to see us try to plugin new auth methods to Java via javax service providers for SASL & http & JAAS. The more code we wrap around these standard facilities, the greater the chance that a security flaw will sneak in.

          I'm working on a doc to detail how I would envision SSO integrating in a manner that avoids these concerns.

          Show
          Daryn Sharp added a comment - I'm still digesting the doc, but it's unclear to me if a client in this architecture will be able to simultaneously access both SSO clusters and non-SSO (ex. kerberos) clusters? A large scale deployment such as Yahoo will never adopt a new security framework if a staged migration will break cross-cluster access. I'm a bit concerned about wanting to migrate the existing auth methods into a new framework. I'd prefer to see us try to plugin new auth methods to Java via javax service providers for SASL & http & JAAS. The more code we wrap around these standard facilities, the greater the chance that a security flaw will sneak in. I'm working on a doc to detail how I would envision SSO integrating in a manner that avoids these concerns.
          Hide
          Tianyou Li added a comment -

          Hi Brian,

          Please allow me to clarify your concerns first and then switch to your comments about how this feature(s) proceeding.

          > Regarding a client passing credentials to TAS: It seems that you are saying that a client would not pass credentials to TAS in all scenarios.This is not reflected in the diagram.

          In the scenarios mentioned when IdP does not have the capability of generating trusted token, TAS can be used as agent to collect credentials and then pass the credentials to IdP, which exactly reflected by the flow shows in page 3. Maybe we should add another diagram to address the scenarios of exchanging a token by supplying credentials to IdP directly. Or we can use the term ‘identity data store’ for LDAP, DB etc to distinguish with ‘IdP’ from the term ‘identity backend’.

          > I also am not sure what you mean by "TAS should be trusted by client for authentication"

          That means when client is initiating authentication process with TAS, the TAS should prove itself to the client. This can be done via SSL/TLS server certificate, and enable SSL/TLS can also secure the credentials transmission in the following authentication process.

          > Trusting with credentials violates basic security principles, which I would not see as an improvement in Hadoop security.

          We only do ‘Trusting with credentials’ when necessary. With our experience of supporting customer with identity data store such like LDAP, DB etc, the way we presents in page 3 is very common and can also be observed in PingFederate, SecureAuth etc. In addition, the credentials can be collected by TAS but that does not mean TAS will and need to persistent credentials, or pass the credentials on the wire if digest or other mechanism is available.

          Now back to the comments of how this feature(s) proceeding:

          > I do think that breaking things down into sub-tasks is a good idea

          I agree with that, and we will update the jira(s) in near future.

          > One work item that fits into your design is that of adding token support to RPC endpoints. This is a work item that would add value for customers right away while still allowing flexibility in the rest of the design. This is something we would like to begin work on now (after consulting Daryn Sharp, since I understand he's been doing some work in this area).

          We agree that support for RPC endpoints provides immediate value add for customers. We are working on the patch related to RPC level authN, based on some good work from Daryn. Hopefully we can publish that patch real soon and start a more “narrowly-scoped discussion” with patch for the sub-tasks which use 9392 as umbrella.

          Regards.

          Show
          Tianyou Li added a comment - Hi Brian, Please allow me to clarify your concerns first and then switch to your comments about how this feature(s) proceeding. > Regarding a client passing credentials to TAS: It seems that you are saying that a client would not pass credentials to TAS in all scenarios.This is not reflected in the diagram. In the scenarios mentioned when IdP does not have the capability of generating trusted token, TAS can be used as agent to collect credentials and then pass the credentials to IdP, which exactly reflected by the flow shows in page 3. Maybe we should add another diagram to address the scenarios of exchanging a token by supplying credentials to IdP directly. Or we can use the term ‘identity data store’ for LDAP, DB etc to distinguish with ‘IdP’ from the term ‘identity backend’. > I also am not sure what you mean by "TAS should be trusted by client for authentication" That means when client is initiating authentication process with TAS, the TAS should prove itself to the client. This can be done via SSL/TLS server certificate, and enable SSL/TLS can also secure the credentials transmission in the following authentication process. > Trusting with credentials violates basic security principles, which I would not see as an improvement in Hadoop security. We only do ‘Trusting with credentials’ when necessary. With our experience of supporting customer with identity data store such like LDAP, DB etc, the way we presents in page 3 is very common and can also be observed in PingFederate, SecureAuth etc. In addition, the credentials can be collected by TAS but that does not mean TAS will and need to persistent credentials, or pass the credentials on the wire if digest or other mechanism is available. Now back to the comments of how this feature(s) proceeding: > I do think that breaking things down into sub-tasks is a good idea I agree with that, and we will update the jira(s) in near future. > One work item that fits into your design is that of adding token support to RPC endpoints. This is a work item that would add value for customers right away while still allowing flexibility in the rest of the design. This is something we would like to begin work on now (after consulting Daryn Sharp, since I understand he's been doing some work in this area). We agree that support for RPC endpoints provides immediate value add for customers. We are working on the patch related to RPC level authN, based on some good work from Daryn. Hopefully we can publish that patch real soon and start a more “narrowly-scoped discussion” with patch for the sub-tasks which use 9392 as umbrella. Regards.
          Hide
          Sanjay Radia added a comment -

          This document helps clarify the proposal. Thanks. I would like to improve terminology confusion in two area: the terms token and *Token authentication service".

          • Hadoop already has tokens used for authentication. Discussions in this jira clarified that the hadoop tokens were general and not limited to hdfs as was originally mentioned in this Jira.
          • Further all authentication solutions use tokens/tickets and "token-based" is not the distinguishing characteristic of this solution. Indeed its distinguishing characteristics is a different model for pluggability.

          Hence I would like to propose to change the name of TAS and also add a suffix or prefix to the new tokens to avoid confusion with the Hadoop tokens. The TAS is really a federated authentication service, where each TAS is centralized. So how about calling it an Hadoop Authentication service HAS. Or perhaps a Pluggable Authentication Service - PAS (or HPAS?). Indeed pluggability is its distinguishing characteristics - you don't have to plugin on the RPC layer but in this service. As for the name of the new tokens: PAS-tokens or HAS-tokens depending on whether the service is called HAS or PAS.

          Show
          Sanjay Radia added a comment - This document helps clarify the proposal. Thanks. I would like to improve terminology confusion in two area: the terms token and *Token authentication service". Hadoop already has tokens used for authentication. Discussions in this jira clarified that the hadoop tokens were general and not limited to hdfs as was originally mentioned in this Jira. Further all authentication solutions use tokens/tickets and "token-based" is not the distinguishing characteristic of this solution. Indeed its distinguishing characteristics is a different model for pluggability. Hence I would like to propose to change the name of TAS and also add a suffix or prefix to the new tokens to avoid confusion with the Hadoop tokens. The TAS is really a federated authentication service, where each TAS is centralized. So how about calling it an Hadoop Authentication service HAS. Or perhaps a Pluggable Authentication Service - PAS (or HPAS?). Indeed pluggability is its distinguishing characteristics - you don't have to plugin on the RPC layer but in this service. As for the name of the new tokens: PAS-tokens or HAS-tokens depending on whether the service is called HAS or PAS.
          Hide
          Kai Zheng added a comment -

          Thanks for your thoughts. Sorry for being late to reply this as we're busy with breaking down TokenAuth and working on our initial patch. HAS (Hadoop A* Server) is good and I like it. The 'A' of HAS could be explained as "Authentication", "Authorization", or "Auditing" or more of them, depending on HAS is provisioned with which role(s). In this way it's much flexible and better to evolve in future. In high level considerations, we may need centralized Authentication Server, centralized Authorization Server, or even centralized Auditing Server, and such servers would be great to be combined into one centralized server, or deployed separately regarding network/multi-tenancy concerns. Currently we're mainly focusing on "Authentication" and "Authorization" aspects, and the two roles can be provisioned into one server, or separate servers, and the server can be just unified and named as HAS.

          In TokenAuth, we use Identity Token and Access Token in places where appropriate, and only mention token in contexts that the token can be clearly interpreted as either Identity Token or Access Token. If HAS is accepted, then we can use “HAS tokens” to mean Identity/Access Tokens in TokenAuth as you suggested. Considering Hadoop existing tokens such as delegation token/block token/job token and the like are widely used today we can continue to use “Hadoop tokens” to mean them.

          Please let me know what's your further thought about this. Thanks.

          Show
          Kai Zheng added a comment - Thanks for your thoughts. Sorry for being late to reply this as we're busy with breaking down TokenAuth and working on our initial patch. HAS (Hadoop A* Server) is good and I like it. The 'A' of HAS could be explained as "Authentication", "Authorization", or "Auditing" or more of them, depending on HAS is provisioned with which role(s). In this way it's much flexible and better to evolve in future. In high level considerations, we may need centralized Authentication Server, centralized Authorization Server, or even centralized Auditing Server, and such servers would be great to be combined into one centralized server, or deployed separately regarding network/multi-tenancy concerns. Currently we're mainly focusing on "Authentication" and "Authorization" aspects, and the two roles can be provisioned into one server, or separate servers, and the server can be just unified and named as HAS. In TokenAuth, we use Identity Token and Access Token in places where appropriate, and only mention token in contexts that the token can be clearly interpreted as either Identity Token or Access Token. If HAS is accepted, then we can use “HAS tokens” to mean Identity/Access Tokens in TokenAuth as you suggested. Considering Hadoop existing tokens such as delegation token/block token/job token and the like are widely used today we can continue to use “Hadoop tokens” to mean them. Please let me know what's your further thought about this. Thanks.
          Hide
          Kai Zheng added a comment -

          This is the main breakdown doc. The whole TokenAuth work is divided into 3 parts: TokenAuth Framework, TokenAuth implementation – HAS, and TokenAuth Integration. We define each of them in the doc and opened JIRAs for them accordingly. We’re working on the further breakdown in appropriate level and will update this initial doc soon when it’s complete.

          Show
          Kai Zheng added a comment - This is the main breakdown doc. The whole TokenAuth work is divided into 3 parts: TokenAuth Framework, TokenAuth implementation – HAS, and TokenAuth Integration. We define each of them in the doc and opened JIRAs for them accordingly. We’re working on the further breakdown in appropriate level and will update this initial doc soon when it’s complete.
          Hide
          Sanjay Radia added a comment -

          Looks like we are mostly in agreement. However I do not agree with the following:

          The 'A' of HAS could be explained as "Authentication", "Authorization", or "Auditing" or more of them, depending on HAS is provisioned with which role(s). In this way it's much flexible and better to evolve in future.

          I understand the notion of the a central authentication server and that is what you have explained in the design. I believe that most if not all of the authorization belongs closer to the resources servers being access. So for now lets just call this the hadoop-authentication-service. Later if and when we have design for centralized authorization we can expand the scope of the service.

          I would like to change this jira's title to "Hadoop Authentication Service". Also drop the SSO from the title since that is not unique to the HAS - today's Kerberos/Authentication service supports SSO just as the HAS will.

          Show
          Sanjay Radia added a comment - Looks like we are mostly in agreement. However I do not agree with the following: The 'A' of HAS could be explained as "Authentication", "Authorization", or "Auditing" or more of them, depending on HAS is provisioned with which role(s). In this way it's much flexible and better to evolve in future. I understand the notion of the a central authentication server and that is what you have explained in the design. I believe that most if not all of the authorization belongs closer to the resources servers being access. So for now lets just call this the hadoop-authentication-service. Later if and when we have design for centralized authorization we can expand the scope of the service. I would like to change this jira's title to "Hadoop Authentication Service". Also drop the SSO from the title since that is not unique to the HAS - today's Kerberos/Authentication service supports SSO just as the HAS will.
          Hide
          Sanjay Radia added a comment -

          I would like to change this jira's title to "Hadoop Authentication Service". ...

          Sorry had not noticed you had created Hadoop-9798. The title I suggested applies more to that Jira.
          So this jira is really about "make Hadoop Authentication pluggable beyond Kerberos and Hadoop-tokens".

          Show
          Sanjay Radia added a comment - I would like to change this jira's title to "Hadoop Authentication Service". ... Sorry had not noticed you had created Hadoop-9798. The title I suggested applies more to that Jira. So this jira is really about "make Hadoop Authentication pluggable beyond Kerberos and Hadoop-tokens".

            People

            • Assignee:
              Kai Zheng
              Reporter:
              Kai Zheng
            • Votes:
              2 Vote for this issue
              Watchers:
              53 Start watching this issue

              Dates

              • Created:
                Updated:

                Development