Details

    • Hadoop Flags:
      Reviewed

      Description

      As discussed in HIVE-842, kerberos authentication is only sufficient for authentication of a hive user client to the metastore. There are other cases where thrift calls need to be authenticated when the caller is running in an environment without kerberos credentials. For example, an MR task running as part of a hive job may want to report statistics to the metastore, or a job may be running within the context of Oozie or Hive Server.

      This JIRA is to implement support of delegation tokens for the metastore. The concept of a delegation token is borrowed from the Hadoop security design - the quick summary is that a kerberos-authenticated client may retrieve a binary token from the server. This token can then be passed to other clients which can use it to achieve authentication as the original user in lieu of a kerberos ticket.

      1. hive-1696-4-with-gen-code.patch
        581 kB
        Carl Steinbach
      2. hive-1696-4-with-gen-code.1.patch
        581 kB
        Devaraj Das
      3. hive-1696-4.patch
        66 kB
        Devaraj Das
      4. hive-1696-4.patch
        66 kB
        Devaraj Das
      5. hive-1696-3-with-gen-code.patch
        393 kB
        Devaraj Das
      6. hive-1696-3.patch
        64 kB
        Devaraj Das
      7. hive-1696-1-with-gen-code.patch
        380 kB
        Devaraj Das
      8. hive-1696-1.patch
        52 kB
        Devaraj Das
      9. hive_1696.patch
        271 kB
        Ashutosh Chauhan
      10. hive_1696.patch
        260 kB
        Ashutosh Chauhan
      11. hive_1696_no-thrift.patch
        44 kB
        Ashutosh Chauhan

        Issue Links

          Activity

          Hide
          Devaraj Das added a comment -

          For the record, I'd like to mention that Pradeep Kamath did a lot of initial work on the patch. Thanks, Pradeep!

          Show
          Devaraj Das added a comment - For the record, I'd like to mention that Pradeep Kamath did a lot of initial work on the patch. Thanks, Pradeep!
          Hide
          Namit Jain added a comment -

          Committed. Thanks Devaraj

          Show
          Namit Jain added a comment - Committed. Thanks Devaraj
          Hide
          Devaraj Das added a comment -

          Yes, I had forgotten to put it in PA state. Thanks!

          Show
          Devaraj Das added a comment - Yes, I had forgotten to put it in PA state. Thanks!
          Hide
          Namit Jain added a comment -

          Devaraj, I am assuming this is ready for review

          Show
          Namit Jain added a comment - Devaraj, I am assuming this is ready for review
          Hide
          Devaraj Das added a comment -

          Thanks Carl!
          In my earlier patch there was a typo in the testcase, and it was a mistake on my part during the patch generation due to which the typo crept in (due to which the test will fail). I edited Carl's patch and fixed that.

          Show
          Devaraj Das added a comment - Thanks Carl! In my earlier patch there was a typo in the testcase, and it was a mistake on my part during the patch generation due to which the typo crept in (due to which the test will fail). I edited Carl's patch and fixed that.
          Hide
          Carl Steinbach added a comment -

          Devaraj's patch with generated Thrift code.

          Show
          Carl Steinbach added a comment - Devaraj's patch with generated Thrift code.
          Hide
          John Sichi added a comment -

          Looks fine to me. Can you post the patch with generated code so I can do some testing?

          Show
          John Sichi added a comment - Looks fine to me. Can you post the patch with generated code so I can do some testing?
          Hide
          Devaraj Das added a comment -

          Sorry missed updating the metastore client with the "kerberos" prefix in the "principal name" references. This patch fixes that.

          Show
          Devaraj Das added a comment - Sorry missed updating the metastore client with the "kerberos" prefix in the "principal name" references. This patch fixes that.
          Hide
          Devaraj Das added a comment -

          I renamed the "principal" to "kerberos_principal" in the thrift calls introduced by this patch. Also this patch has only one getTokenStrForm method in the shim (the implementation of the method already handles what the other method was handling), and i put a detailed javadoc on that. This patch is without the gen'ed code. Once I get a go-ahead on this patch, I will submit a patch with the gen'ed code. Could this patch be quickly looked at please?

          Show
          Devaraj Das added a comment - I renamed the "principal" to "kerberos_principal" in the thrift calls introduced by this patch. Also this patch has only one getTokenStrForm method in the shim (the implementation of the method already handles what the other method was handling), and i put a detailed javadoc on that. This patch is without the gen'ed code. Once I get a go-ahead on this patch, I will submit a patch with the gen'ed code. Could this patch be quickly looked at please?
          Hide
          John Sichi added a comment -

          HIVE-78 (which is really large, and conflicts with this one due to the metastore codegen) has been wending its way through review for quite some time; I think Namit is going to try to get it committed tomorrow, and then we'll need an update on this one.

          A few comments from me:

          • HIVE-78's metastore API additions also references "principal", but it has a different meaning, so we should find a way to distinguish them.
          • The new conf variable should be named hive.metastore.token.signature
          • In HadoopShims.java, the overload for getTokenStrForm needs Javadoc for the tokenSignature parameter
          Show
          John Sichi added a comment - HIVE-78 (which is really large, and conflicts with this one due to the metastore codegen) has been wending its way through review for quite some time; I think Namit is going to try to get it committed tomorrow, and then we'll need an update on this one. A few comments from me: HIVE-78 's metastore API additions also references "principal", but it has a different meaning, so we should find a way to distinguish them. The new conf variable should be named hive.metastore.token.signature In HadoopShims.java, the overload for getTokenStrForm needs Javadoc for the tokenSignature parameter
          Hide
          Ashutosh Chauhan added a comment -

          Is somebody taking a look at this? Maintaining a separate branch for an uncommitted patch is no fun : (

          Show
          Ashutosh Chauhan added a comment - Is somebody taking a look at this? Maintaining a separate branch for an uncommitted patch is no fun : (
          Hide
          Carl Steinbach added a comment -

          Reviewboard request: https://reviews.apache.org/r/198/

          Show
          Carl Steinbach added a comment - Reviewboard request: https://reviews.apache.org/r/198/
          Hide
          Devaraj Das added a comment -

          Patch with the generated code..

          Show
          Devaraj Das added a comment - Patch with the generated code..
          Hide
          Devaraj Das added a comment -

          Attached patch has a testcase that tests the Hive MetaStore client to MetaStore server communication with SASL on & delegation tokens. The test is run as part of the "test" target at the top level.

          Show
          Devaraj Das added a comment - Attached patch has a testcase that tests the Hive MetaStore client to MetaStore server communication with SASL on & delegation tokens. The test is run as part of the "test" target at the top level.
          Hide
          Devaraj Das added a comment -

          This is with the generated code.

          Show
          Devaraj Das added a comment - This is with the generated code.
          Hide
          Devaraj Das added a comment -

          Patch with a little bit of refactoring on the previous patch. I also removed the code for checking whether a client is kerberos authenticated in the getDelegationToken and renewDelegationToken methods. That requires some changes in a Hadoop security API to have a cleaner implementation. We can do that in a follow up jira.

          Show
          Devaraj Das added a comment - Patch with a little bit of refactoring on the previous patch. I also removed the code for checking whether a client is kerberos authenticated in the getDelegationToken and renewDelegationToken methods. That requires some changes in a Hadoop security API to have a cleaner implementation. We can do that in a follow up jira.
          Hide
          Devaraj Das added a comment -

          The patch looks fine to me.

          Show
          Devaraj Das added a comment - The patch looks fine to me.
          Hide
          Ashutosh Chauhan added a comment -

          Yongqiang,

          Can you take a look? We have been testing it for a month in our test cluster. This patch is ready to go in.

          Show
          Ashutosh Chauhan added a comment - Yongqiang, Can you take a look? We have been testing it for a month in our test cluster. This patch is ready to go in.
          Hide
          Ashutosh Chauhan added a comment -

          New patches (with and without thrift).

          Devraj,
          This includes both of your suggestions. Can you take a look again ?

          Show
          Ashutosh Chauhan added a comment - New patches (with and without thrift). Devraj, This includes both of your suggestions. Can you take a look again ?
          Hide
          Devaraj Das added a comment -

          Upon a bit more thought, it seems to me that we don't actually need the DelegationTokenManager shim. Since the delegation token stuff is part of Hadoop 20S, we might as well merge these classes/methods in the 20S shim...

          Show
          Devaraj Das added a comment - Upon a bit more thought, it seems to me that we don't actually need the DelegationTokenManager shim. Since the delegation token stuff is part of Hadoop 20S, we might as well merge these classes/methods in the 20S shim...
          Hide
          Ashutosh Chauhan added a comment -

          Devraj,

          Thanks for the review. I will incorporate those changes and will generate a new patch.

          Show
          Ashutosh Chauhan added a comment - Devraj, Thanks for the review. I will incorporate those changes and will generate a new patch.
          Hide
          Devaraj Das added a comment -

          I did a walk-thru of the patch. Looks good mostly. One comment I have is that the server should check delegation-token issue/renewal are allowed for kerberos-authenticated users only. This is what is done in the cases of HDFS/MAPREDUCE delegation tokens.

          Show
          Devaraj Das added a comment - I did a walk-thru of the patch. Looks good mostly. One comment I have is that the server should check delegation-token issue/renewal are allowed for kerberos-authenticated users only. This is what is done in the cases of HDFS/MAPREDUCE delegation tokens.
          Hide
          Ashutosh Chauhan added a comment -

          This builds on top of current HIVE-842 patch. Adds delegation token support for Hive.

          Show
          Ashutosh Chauhan added a comment - This builds on top of current HIVE-842 patch. Adds delegation token support for Hive.
          Hide
          Todd Lipcon added a comment -

          A few of us had a phone call this morning. We briefly discussed a design for this, summarized below:

          • The metastore should make use of the delegation token facilities in Hadoop Common. The classes in Common are already generic since they're used by both MR and HDFS for their delegation token types.
          • The metastore needs to keep track of active delegation tokens across restarts - it probably makes sense to use the existing DB backing store for this.
          • The metastore thrift API will need a new call, something like: binary getDelegationToken(1: string renewer) which returns the opaque token.
          • We'll need to make some changes to HadoopThriftAuthBridge from HIVE-842 in order to support using a delegation token over SASL.

          In terms of the use cases above, here are some thoughts on how the delegation tokens will be used:

          MR tasks reporting statistics

          When a hive job is submitted, it will first obtain a DT from the hive metastore. This DT will be passed with the job, either as a private distributedcache file, or maybe base64-encoded in the jobconf itself. The MR tasks themselves will then load the token into the UGI before making calls. This is basically the pattern that normal hadoop MR jobs use to access HDFS from within a task.

          Oozie or Hive Server jobs

          Before Oozie or Hive Server forks the child process which actually runs the job, it will need to obtain a delegation token from the metastore on behalf of the user running the job. It will then provide this to the child process using an environment variable or configuration property. In this case, Oozie or the Hive Server needs to be configured as a "proxy superuser" on the metastore - ie the oozie/_HOST or hiveserver/_HOST principal is allowed to impersonate other users in order to grab delegation tokens for them.

          Show
          Todd Lipcon added a comment - A few of us had a phone call this morning. We briefly discussed a design for this, summarized below: The metastore should make use of the delegation token facilities in Hadoop Common. The classes in Common are already generic since they're used by both MR and HDFS for their delegation token types. The metastore needs to keep track of active delegation tokens across restarts - it probably makes sense to use the existing DB backing store for this. The metastore thrift API will need a new call, something like: binary getDelegationToken(1: string renewer) which returns the opaque token. We'll need to make some changes to HadoopThriftAuthBridge from HIVE-842 in order to support using a delegation token over SASL. In terms of the use cases above, here are some thoughts on how the delegation tokens will be used: MR tasks reporting statistics When a hive job is submitted, it will first obtain a DT from the hive metastore. This DT will be passed with the job, either as a private distributedcache file, or maybe base64-encoded in the jobconf itself. The MR tasks themselves will then load the token into the UGI before making calls. This is basically the pattern that normal hadoop MR jobs use to access HDFS from within a task. Oozie or Hive Server jobs Before Oozie or Hive Server forks the child process which actually runs the job, it will need to obtain a delegation token from the metastore on behalf of the user running the job. It will then provide this to the child process using an environment variable or configuration property. In this case, Oozie or the Hive Server needs to be configured as a "proxy superuser" on the metastore - ie the oozie/_HOST or hiveserver/_HOST principal is allowed to impersonate other users in order to grab delegation tokens for them.

            People

            • Assignee:
              Devaraj Das
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development