Hive
  1. Hive
  2. HIVE-842

Authentication Infrastructure for Hive

    Details

      Description

      This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do.

      1. hive-842_2.patch
        26 kB
        Ashutosh Chauhan
      2. hive-842.txt
        62 kB
        Todd Lipcon
      3. HiveSecurityThoughts.pdf
        78 kB
        John Sichi

        Issue Links

          Activity

          Hide
          Edward Capriolo added a comment -

          hive.conf

          <property>
            <name>hive.authenticate.class</name>
            <value>org.apache.hadoop.hive.auth.DefaultAuthenticator</value>
              <description>Use this setting to use your own authentication framework. LDAP mysql etc
            </description>
          </property>
          
          public interface Authenticator {
            public boolean authenticate(SessionState session);
          }
          class DefaultAuthenticator implements Authenticator{
            public boolean authenticate(SessionState session){
              return true;
            }
          }
          

          Thus the authentication is plugable

          class SharedSecretAuthenticator implements Authenticator {
             public boolean authenticate(SessionState session){
                if (session.ss.getConf().getVar("USERNAME").equals("admind") &&
                    session.ss.getConf().getVar("PASSWORD").equals("secret") )
                      return true;
                return false;
            }
          }
          

          It would be trivial to them implement LDAP, Mysql, or other types of authentication. The call to the authenticator could be plugged in to the API anywhere a reference to the clients SessionState exists.

          Show
          Edward Capriolo added a comment - hive.conf <property> <name>hive.authenticate.class</name> <value>org.apache.hadoop.hive.auth.DefaultAuthenticator</value> <description>Use this setting to use your own authentication framework. LDAP mysql etc </description> </property> public interface Authenticator { public boolean authenticate(SessionState session); } class DefaultAuthenticator implements Authenticator{ public boolean authenticate(SessionState session){ return true; } } Thus the authentication is plugable class SharedSecretAuthenticator implements Authenticator { public boolean authenticate(SessionState session){ if (session.ss.getConf().getVar("USERNAME").equals("admind") && session.ss.getConf().getVar("PASSWORD").equals("secret") ) return true; return false; } } It would be trivial to them implement LDAP, Mysql, or other types of authentication. The call to the authenticator could be plugged in to the API anywhere a reference to the clients SessionState exists.
          Hide
          Edward Capriolo added a comment -

          @Min I added you as a watcher on this issue I hope you do not mind.

          At Hadoop world NYC I got to listen to Owen O'Malley do his presentation on Hadoop security. He also took some time to answer some questions for me.

          In summary Hadoop 0.22 is going to have authentication at the RPC layer. This can be turned on and turned off through configuration in Hadoop. This authentication will be able to use Kerberos or active directories kerberos implementation.

          DFS is the easy case. You authenticate directly to it.
          MapReduce is another beast. The job tracker/task tracker will have to run jobs as the user on the system! So my jobs will be run from my posix account ( I am not sure if this is inplace on only the JobTracker or the TaskTracker as well)

          Programs that act as proxies like JobTracker might need a binary shim that starts them as root user then drops to a hadoop users, this is also required to run jobs as that user.

          "Why kerberos?" I asked him. Kerberos allows a ticket to be created and attached to you session. This is because kerberos can create you a ticket that you can then pass onto the job tracker for example. Otherwise you would have to password/key on the job tracker itself which would be nasty to put your password in a jobconf.

          So, it seems like proxy type applications like HWI and HiveServer may have to take some part in passing around the kerberos tickets.

          The hadoop WebInterfaces will use Kerberos as well. SPNEGO is a protocol for this and it has good cross browser support. So that is the future...

          Show
          Edward Capriolo added a comment - @Min I added you as a watcher on this issue I hope you do not mind. At Hadoop world NYC I got to listen to Owen O'Malley do his presentation on Hadoop security. He also took some time to answer some questions for me. In summary Hadoop 0.22 is going to have authentication at the RPC layer. This can be turned on and turned off through configuration in Hadoop. This authentication will be able to use Kerberos or active directories kerberos implementation. DFS is the easy case. You authenticate directly to it. MapReduce is another beast. The job tracker/task tracker will have to run jobs as the user on the system! So my jobs will be run from my posix account ( I am not sure if this is inplace on only the JobTracker or the TaskTracker as well) Programs that act as proxies like JobTracker might need a binary shim that starts them as root user then drops to a hadoop users, this is also required to run jobs as that user. "Why kerberos?" I asked him. Kerberos allows a ticket to be created and attached to you session. This is because kerberos can create you a ticket that you can then pass onto the job tracker for example. Otherwise you would have to password/key on the job tracker itself which would be nasty to put your password in a jobconf. So, it seems like proxy type applications like HWI and HiveServer may have to take some part in passing around the kerberos tickets. The hadoop WebInterfaces will use Kerberos as well. SPNEGO is a protocol for this and it has good cross browser support. So that is the future...
          Hide
          Min Zhou added a comment -

          @Edward

          Kerberos for authethication is a good way I think, user/password is no need here. This issue would be implemented in the future.
          btw, we've finished the development of authorization infrastructure for Hive.

          Show
          Min Zhou added a comment - @Edward Kerberos for authethication is a good way I think, user/password is no need here. This issue would be implemented in the future. btw, we've finished the development of authorization infrastructure for Hive.
          Hide
          John Sichi added a comment -

          For lack of a better place, uploading this doc from Venkatesh here so I can link it from wiki.

          Show
          John Sichi added a comment - For lack of a better place, uploading this doc from Venkatesh here so I can link it from wiki.
          Hide
          Todd Lipcon added a comment -

          As discussed at the last contributor meeting, I am working on authenticating access to the metastore by kerberizing the Thrift interface.

          Plan is currently:
          1) Update the version of Thrift in Hive to 0.4.0
          2) Temporarily check in the SASL support from Thrift trunk (this will be in 0.5.0 release, due out in October some time)
          3) Build a bridge between Thrift's SASL support and Hadoop's UserGroupInformation classes. Thus, if a user has a current UGI on the client side, it will get propagated to the JAAS context on the handler side.
          4) In places where the metastore accesses the file system, use the "proxy user" functionality to act on behalf of the authenticated user.
          5) When we detect that we are running on secure hadoop with security enabled, enable the above functionality.

          I'd like to attack the Hive Web UI separately.

          One open question:

          • Do Hive tasks ever need to authenticate to the metastore? If so, we will have to build a delegation token system into Hive.
          Show
          Todd Lipcon added a comment - As discussed at the last contributor meeting, I am working on authenticating access to the metastore by kerberizing the Thrift interface. Plan is currently: 1) Update the version of Thrift in Hive to 0.4.0 2) Temporarily check in the SASL support from Thrift trunk (this will be in 0.5.0 release, due out in October some time) 3) Build a bridge between Thrift's SASL support and Hadoop's UserGroupInformation classes. Thus, if a user has a current UGI on the client side, it will get propagated to the JAAS context on the handler side. 4) In places where the metastore accesses the file system, use the "proxy user" functionality to act on behalf of the authenticated user. 5) When we detect that we are running on secure hadoop with security enabled, enable the above functionality. I'd like to attack the Hive Web UI separately. One open question: Do Hive tasks ever need to authenticate to the metastore? If so, we will have to build a delegation token system into Hive.
          Hide
          Venkatesh Seetharam added a comment -

          > * Do Hive tasks ever need to authenticate to the metastore? If so, we will have to build a delegation token system into Hive.
          I learnt it from Alan and Pradeep that Howl uses the commit task to talk to the metastore. Hence we'll have to build the delegation token system.

          Show
          Venkatesh Seetharam added a comment - > * Do Hive tasks ever need to authenticate to the metastore? If so, we will have to build a delegation token system into Hive. I learnt it from Alan and Pradeep that Howl uses the commit task to talk to the metastore. Hence we'll have to build the delegation token system.
          Hide
          Todd Lipcon added a comment -

          OK. The code in Hadoop Common is somewhat reusable for this, so it shouldn't be too hard to implement. If I recall correctly, though, the delegation tokens rely on a secret key that the master daemon periodically rotates. We need to add some kind of persistent token storage for this to work - I guess in the metastore's DB?

          To make this easier to review, I'd like to do the straight kerberos first, and then add delegation tokens in a second patch/JIRA. Sound good?

          Show
          Todd Lipcon added a comment - OK. The code in Hadoop Common is somewhat reusable for this, so it shouldn't be too hard to implement. If I recall correctly, though, the delegation tokens rely on a secret key that the master daemon periodically rotates. We need to add some kind of persistent token storage for this to work - I guess in the metastore's DB? To make this easier to review, I'd like to do the straight kerberos first, and then add delegation tokens in a second patch/JIRA. Sound good?
          Hide
          Venkatesh Seetharam added a comment -

          Sounds good to me.

          Show
          Venkatesh Seetharam added a comment - Sounds good to me.
          Hide
          Edward Capriolo added a comment -

          By attack the Web UI separately what is meant? Will it be broken or non-functional at any phase here? That is what I find happens often, some of it is really the WUI's fault for using JSP and not servlets, but there is no simple way to code cover the wui and all the different ways its gets broken.

          Show
          Edward Capriolo added a comment - By attack the Web UI separately what is meant? Will it be broken or non-functional at any phase here? That is what I find happens often, some of it is really the WUI's fault for using JSP and not servlets, but there is no simple way to code cover the wui and all the different ways its gets broken.
          Hide
          Todd Lipcon added a comment -

          I don't anticipate breaking the web UI (or anything) on non-secure Hadoop versions. But it will probably be insecure to run the web UI, which currently trusts users to say who they want to be - i.e I don't plan in the short term to integrate an auth layer for the web UI itself.

          Show
          Todd Lipcon added a comment - I don't anticipate breaking the web UI (or anything) on non-secure Hadoop versions. But it will probably be insecure to run the web UI, which currently trusts users to say who they want to be - i.e I don't plan in the short term to integrate an auth layer for the web UI itself.
          Hide
          Todd Lipcon added a comment -

          I have this basically working. A couple questions I wanted to run by people before posting a patch:

          • Should the metastore always take HDFS actions as the user making the RPC? Or, for example, with a create table call, should it act as the "owner" specified in the thrift call regardless of the authenticated user? If the latter, what authorization mechanism do we need? (ie is there a use case where user A can make tables on behalf of user B?)
          • Are there any metastore operations that should be done as a metastore principal, or should all HDFS access be done as the authenticated user?
          • If we see that Hadoop Security is enabled, should we enable SASL on the metastore thrift server by default? If SASL-thrift is not enabled, what user should the metastore act as? In other words, should there be an option whereby the metastore uses a keytab to authenticate to HDFS, but doesn't require users to authenticate to it?
          Show
          Todd Lipcon added a comment - I have this basically working. A couple questions I wanted to run by people before posting a patch: Should the metastore always take HDFS actions as the user making the RPC? Or, for example, with a create table call, should it act as the "owner" specified in the thrift call regardless of the authenticated user? If the latter, what authorization mechanism do we need? (ie is there a use case where user A can make tables on behalf of user B?) Are there any metastore operations that should be done as a metastore principal, or should all HDFS access be done as the authenticated user? If we see that Hadoop Security is enabled, should we enable SASL on the metastore thrift server by default? If SASL-thrift is not enabled, what user should the metastore act as? In other words, should there be an option whereby the metastore uses a keytab to authenticate to HDFS, but doesn't require users to authenticate to it?
          Hide
          Venkatesh Seetharam added a comment -

          > Should the metastore always take HDFS actions as the user making the RPC?
          Yes, metastore will run as a super-user (Hadoop proxy user) enabling DO AS operations and impersonate the target user while accessing data on HDFS.

          > If we see that Hadoop Security is enabled, should we enable SASL on the metastore thrift server by default?
          I'd think so.

          > should there be an option whereby the metastore uses a keytab to authenticate to HDFS, but doesn't require users to authenticate to it?
          Wouldn't this leave a hole as it currently exists?

          Show
          Venkatesh Seetharam added a comment - > Should the metastore always take HDFS actions as the user making the RPC? Yes, metastore will run as a super-user (Hadoop proxy user) enabling DO AS operations and impersonate the target user while accessing data on HDFS. > If we see that Hadoop Security is enabled, should we enable SASL on the metastore thrift server by default? I'd think so. > should there be an option whereby the metastore uses a keytab to authenticate to HDFS, but doesn't require users to authenticate to it? Wouldn't this leave a hole as it currently exists?
          Hide
          Todd Lipcon added a comment -

          > should there be an option whereby the metastore uses a keytab to authenticate to HDFS, but doesn't require users to authenticate to it?

          Wouldn't this leave a hole as it currently exists?

          Yea - I think the use case is that you may have some old Thrift clients that haven't yet been updated to work with the SASL implementation (eg PHP). For those clients, perhaps you can provide security based on firewall rules, etc. But you would still like to run Hive on top of a secured HDFS.

          Show
          Todd Lipcon added a comment - > should there be an option whereby the metastore uses a keytab to authenticate to HDFS, but doesn't require users to authenticate to it? Wouldn't this leave a hole as it currently exists? Yea - I think the use case is that you may have some old Thrift clients that haven't yet been updated to work with the SASL implementation (eg PHP). For those clients, perhaps you can provide security based on firewall rules, etc. But you would still like to run Hive on top of a secured HDFS.
          Hide
          Todd Lipcon added a comment -

          Here's a "preview" patch of this work. A few notes:

          • This checks in a bunch of Thrift classes that are in Thrift trunk. Thrift is currently in rc phase for an 0.5.0 release, so we can ditch these thrift classes out of Hive as soon as that's out (probably before this patch is even ready for commit)
          • There are still some javadocs that could be improved a little bit.
          • There's currently not any integration into the "guts" of Hive - we simply assume the calling user's identity as soon as the RPC is received. I think that's OK for the scope of this patch, as discussed above.

          There's a bit of a lurking bug, I believe, due to HADOOP-6982, but it's shouldn't be major.

          Show
          Todd Lipcon added a comment - Here's a "preview" patch of this work. A few notes: This checks in a bunch of Thrift classes that are in Thrift trunk. Thrift is currently in rc phase for an 0.5.0 release, so we can ditch these thrift classes out of Hive as soon as that's out (probably before this patch is even ready for commit) There are still some javadocs that could be improved a little bit. There's currently not any integration into the "guts" of Hive - we simply assume the calling user's identity as soon as the RPC is received. I think that's OK for the scope of this patch, as discussed above. There's a bit of a lurking bug, I believe, due to HADOOP-6982 , but it's shouldn't be major.
          Hide
          Pradeep Kamath added a comment -

          I tried applying this patch after applying the patch for HIVE-1264 and got the following compile errors which seem to suggest I am missing some jar (seems thrift related) - any pointers on how to resolve these errors?

          build_shims:
               [echo] Compiling shims against hadoop 0.20.104.3.1007202301 (/tmp/hive-svn/build/hadoopcore/hadoop-0.20.104.3.1007202301)
              [javac] Compiling 8 source files to /tmp/hive-svn/build/shims/classes
              [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/thrift/transport/TSaslTransport.java:109: cannot find symbol
              [javac] symbol  : class TMemoryInputTransport
              [javac] location: class org.apache.thrift.transport.TSaslTransport
              [javac]   private TMemoryInputTransport readBuffer = new TMemoryInputTransport();
              [javac]           ^
              [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:354: cannot find symbol
              [javac] symbol  : method getBuffer()
              [javac] location: class org.apache.thrift.transport.TTransport
              [javac]       return wrapped.getBuffer();
              [javac]                     ^
              [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:352: method does not override or implement a method from a supertype
              [javac]     @Override
              [javac]     ^
              [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:359: cannot find symbol
              [javac] symbol  : method getBufferPosition()
              [javac] location: class org.apache.thrift.transport.TTransport
              [javac]       return wrapped.getBufferPosition();
              [javac]                     ^
              [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:357: method does not override or implement a method from a supertype
              [javac]     @Override
              [javac]     ^
              [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:364: cannot find symbol
              [javac] symbol  : method getBytesRemainingInBuffer()
              [javac] location: class org.apache.thrift.transport.TTransport
              [javac]       return wrapped.getBytesRemainingInBuffer();
              [javac]                     ^
              [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:362: method does not override or implement a method from a supertype
              [javac]     @Override
              [javac]     ^
              [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:369: cannot find symbol
              [javac] symbol  : method consumeBuffer(int)
              [javac] location: class org.apache.thrift.transport.TTransport
              [javac]       wrapped.consumeBuffer(len);
              [javac]              ^
              [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:367: method does not override or implement a method from a supertype
              [javac]     @Override
              [javac]     ^
              [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/thrift/transport/TSaslTransport.java:109: cannot find symbol
              [javac] symbol  : class TMemoryInputTransport
              [javac] location: class org.apache.thrift.transport.TSaslTransport
              [javac]   private TMemoryInputTransport readBuffer = new TMemoryInputTransport();
              [javac]                                                  ^
              [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/thrift/transport/TSaslTransport.java:352: cannot find symbol
              [javac] symbol  : method encodeFrameSize(int,byte[])
              [javac] location: class org.apache.thrift.transport.TFramedTransport
              [javac]     TFramedTransport.encodeFrameSize(length, lenBuf);
              [javac]                     ^
              [javac] Note: /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java uses or overrides a deprecated API.
              [javac] Note: Recompile with -Xlint:deprecation for details.
              [javac] Note: /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java uses unchecked or unsafe operations.
              [javac] Note: Recompile with -Xlint:unchecked for details.
              [javac] 11 errors
          
          
          Show
          Pradeep Kamath added a comment - I tried applying this patch after applying the patch for HIVE-1264 and got the following compile errors which seem to suggest I am missing some jar (seems thrift related) - any pointers on how to resolve these errors? build_shims: [echo] Compiling shims against hadoop 0.20.104.3.1007202301 (/tmp/hive-svn/build/hadoopcore/hadoop-0.20.104.3.1007202301) [javac] Compiling 8 source files to /tmp/hive-svn/build/shims/classes [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/thrift/transport/TSaslTransport.java:109: cannot find symbol [javac] symbol : class TMemoryInputTransport [javac] location: class org.apache.thrift.transport.TSaslTransport [javac] private TMemoryInputTransport readBuffer = new TMemoryInputTransport(); [javac] ^ [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:354: cannot find symbol [javac] symbol : method getBuffer() [javac] location: class org.apache.thrift.transport.TTransport [javac] return wrapped.getBuffer(); [javac] ^ [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:352: method does not override or implement a method from a supertype [javac] @Override [javac] ^ [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:359: cannot find symbol [javac] symbol : method getBufferPosition() [javac] location: class org.apache.thrift.transport.TTransport [javac] return wrapped.getBufferPosition(); [javac] ^ [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:357: method does not override or implement a method from a supertype [javac] @Override [javac] ^ [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:364: cannot find symbol [javac] symbol : method getBytesRemainingInBuffer() [javac] location: class org.apache.thrift.transport.TTransport [javac] return wrapped.getBytesRemainingInBuffer(); [javac] ^ [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:362: method does not override or implement a method from a supertype [javac] @Override [javac] ^ [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:369: cannot find symbol [javac] symbol : method consumeBuffer(int) [javac] location: class org.apache.thrift.transport.TTransport [javac] wrapped.consumeBuffer(len); [javac] ^ [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge20S.java:367: method does not override or implement a method from a supertype [javac] @Override [javac] ^ [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/thrift/transport/TSaslTransport.java:109: cannot find symbol [javac] symbol : class TMemoryInputTransport [javac] location: class org.apache.thrift.transport.TSaslTransport [javac] private TMemoryInputTransport readBuffer = new TMemoryInputTransport(); [javac] ^ [javac] /tmp/hive-svn/shims/src/0.20S/java/org/apache/thrift/transport/TSaslTransport.java:352: cannot find symbol [javac] symbol : method encodeFrameSize(int,byte[]) [javac] location: class org.apache.thrift.transport.TFramedTransport [javac] TFramedTransport.encodeFrameSize(length, lenBuf); [javac] ^ [javac] Note: /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: /tmp/hive-svn/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java uses unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 11 errors
          Hide
          Todd Lipcon added a comment -

          Hey Pradeep. You also need HIVE-1526 which updates Hive to use Thrift 0.4.0.

          Show
          Todd Lipcon added a comment - Hey Pradeep. You also need HIVE-1526 which updates Hive to use Thrift 0.4.0.
          Hide
          Pradeep Kamath added a comment -

          Hey Todd, I applied the patches in the following sequence on current hive trunk:
          hive-1264.txt, hive-842.txt and then HIVE-1526.2.patch.txt. The last one didn't apply cleanly for ql/src/gen-javabean/org/apache/hadoop/hive/ql/plan/api/StageType.java - so I manually edited it based on the reject file. After that, I get the following compile error:

          [javac] Compiling 607 source files to /tmp/hive-svn/build/ql/classes
          [javac] /tmp/hive-svn/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java:384: cannot find symbol
          [javac] symbol : class StageType
          [javac] location: class org.apache.hadoop.hive.ql.exec.MapRedTask
          [javac] public StageType getType() {
          [javac] ^
          [javac] /tmp/hive-svn/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java:385: cannot find symbol
          [javac] symbol : variable StageType
          [javac] location: class org.apache.hadoop.hive.ql.exec.MapRedTask
          [javac] return StageType.MAPREDLOCAL;
          [javac] ^
          [javac] /tmp/hive-svn/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java:214: getType() in org.apache.hadoop.hive.ql.exec.StatsTask cannot override getType() in org.apache.hadoop.hive.ql.exec.Task; attempting to use incompatible return type
          [javac] found : int
          [javac] required: org.apache.hadoop.hive.ql.plan.api.StageType
          [javac] public int getType() {
          [javac] ^
          [javac] /tmp/hive-svn/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java:215: cannot find symbol
          [javac] symbol : variable STATS
          [javac] location: class org.apache.hadoop.hive.ql.plan.api.StageType
          [javac] return StageType.STATS;
          [javac] ^
          [javac] /tmp/hive-svn/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java:213: method does not override or implement a method from a supertype
          [javac] @Override
          [javac] ^

          Show
          Pradeep Kamath added a comment - Hey Todd, I applied the patches in the following sequence on current hive trunk: hive-1264.txt, hive-842.txt and then HIVE-1526 .2.patch.txt. The last one didn't apply cleanly for ql/src/gen-javabean/org/apache/hadoop/hive/ql/plan/api/StageType.java - so I manually edited it based on the reject file. After that, I get the following compile error: [javac] Compiling 607 source files to /tmp/hive-svn/build/ql/classes [javac] /tmp/hive-svn/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java:384: cannot find symbol [javac] symbol : class StageType [javac] location: class org.apache.hadoop.hive.ql.exec.MapRedTask [javac] public StageType getType() { [javac] ^ [javac] /tmp/hive-svn/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java:385: cannot find symbol [javac] symbol : variable StageType [javac] location: class org.apache.hadoop.hive.ql.exec.MapRedTask [javac] return StageType.MAPREDLOCAL; [javac] ^ [javac] /tmp/hive-svn/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java:214: getType() in org.apache.hadoop.hive.ql.exec.StatsTask cannot override getType() in org.apache.hadoop.hive.ql.exec.Task; attempting to use incompatible return type [javac] found : int [javac] required: org.apache.hadoop.hive.ql.plan.api.StageType [javac] public int getType() { [javac] ^ [javac] /tmp/hive-svn/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java:215: cannot find symbol [javac] symbol : variable STATS [javac] location: class org.apache.hadoop.hive.ql.plan.api.StageType [javac] return StageType.STATS; [javac] ^ [javac] /tmp/hive-svn/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java:213: method does not override or implement a method from a supertype [javac] @Override [javac] ^
          Hide
          Pradeep Kamath added a comment -

          Adding the dependency on HIVE-1526

          Show
          Pradeep Kamath added a comment - Adding the dependency on HIVE-1526
          Hide
          Todd Lipcon added a comment -

          Seems like the patch that updates Thrift has fallen out of date with trunk. I'll try to regenerate it ASAP. You can probably fix the above issues by (a) importing StageType in MapRedTask, and (b) replacing StatsTask.getType's return with the StageType enum. (the new version of Thrift uses java enums instead of ints to represent thrift enums)

          Show
          Todd Lipcon added a comment - Seems like the patch that updates Thrift has fallen out of date with trunk. I'll try to regenerate it ASAP. You can probably fix the above issues by (a) importing StageType in MapRedTask, and (b) replacing StatsTask.getType's return with the StageType enum. (the new version of Thrift uses java enums instead of ints to represent thrift enums)
          Hide
          Pradeep Kamath added a comment -

          Hey Todd, I did the changes you mentioned and got it to compile. While trying to test it out I had to run the metastore as user whose keytab file only had a "user" principal and not a "service" principal - so I hacked the code in the patch a little to not check if the principal had the service/host@realm structure and I hardcoded the host name into the calls. With all these machinations I got the server to run and tried running "show tables" and got the following with loglevel DEBUG (on the client side):

          javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Fail to create credential. (63) - No service creds)]
          at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)
          at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:95)
          at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:254)
          at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:38)

          Do you think this is because I don't have a "service" principal in the keytab used by the metastore?

          Show
          Pradeep Kamath added a comment - Hey Todd, I did the changes you mentioned and got it to compile. While trying to test it out I had to run the metastore as user whose keytab file only had a "user" principal and not a "service" principal - so I hacked the code in the patch a little to not check if the principal had the service/host@realm structure and I hardcoded the host name into the calls. With all these machinations I got the server to run and tried running "show tables" and got the following with loglevel DEBUG (on the client side): javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Fail to create credential. (63) - No service creds)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:95) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:254) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:38) Do you think this is because I don't have a "service" principal in the keytab used by the metastore?
          Hide
          Todd Lipcon added a comment -

          Hey Pradeep. It sounds like it might be - I haven't seen that error before, but I also have only been testing with actual service principals (ie principals of the type metastore/<hostname>).

          You can try running both sides with HADOOP_OPTS="-Dsun.security.krb5.debug=true" and it should give you some extra details.

          Show
          Todd Lipcon added a comment - Hey Pradeep. It sounds like it might be - I haven't seen that error before, but I also have only been testing with actual service principals (ie principals of the type metastore/<hostname>). You can try running both sides with HADOOP_OPTS="-Dsun.security.krb5.debug=true" and it should give you some extra details.
          Hide
          Pradeep Kamath added a comment -

          Hey Todd - I managed to overcome the issues I was facing earlier by having a "service" principal in the keytab. I did notice though that after a couple of days of the metastore running, I would get an error for a create table and judging by the error message it appeared like the authentication between the metastore and namenode failed with - "Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)])". Is this the bug you have opened in HADOOP-6982? Currently the metastore needs a restart to overcome this.

          Show
          Pradeep Kamath added a comment - Hey Todd - I managed to overcome the issues I was facing earlier by having a "service" principal in the keytab. I did notice though that after a couple of days of the metastore running, I would get an error for a create table and judging by the error message it appeared like the authentication between the metastore and namenode failed with - "Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)])". Is this the bug you have opened in HADOOP-6982 ? Currently the metastore needs a restart to overcome this.
          Hide
          Pradeep Kamath added a comment -

          I looked at the issue of the server requiring restarts with Devaraj Das who worked on Hadoop security - he suggested a couple of changes (below) and that solved it - the server now does not need a restart.
          Apparenlty UserGroupInformation.loginUserFromKeytabAndReturnUGI() does not set the loginUser member and UserGroupInformation.loginUserFromKeytab() does. He also suggested another change with not caching the realUser - both these changes are below:

          In the following code 
           private Server(String keytabFile, String principalConf)
           TTransportException {
           ...
          
                   realUgi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(
                     kerberosName, keytabFile);
                   assert realUgi.isFromKeytab();
          
          I had to change above lines to the lines below:
          
                   UserGroupInformation.loginUserFromKeytab(
                     kerberosName, keytabFile);
                   realUgi = UserGroupInformation.getLoginUser();
          
          
          Likewise in:
          
                public boolean process(final TProtocol inProt, final TProtocol outProt) throws TException {              
                  TTransport trans = inProt.getTransport();                                                              
          	...
                  UserGroupInformation clientUgi = UserGroupInformation.createProxyUser(                                 
                    authId, realUgi);
          
          I changed the above to:
          
            UserGroupInformation clientUgi = UserGroupInformation.createProxyUser(
                         auhtId, UserGroupInformation.getLoginUser());
          
          
          Show
          Pradeep Kamath added a comment - I looked at the issue of the server requiring restarts with Devaraj Das who worked on Hadoop security - he suggested a couple of changes (below) and that solved it - the server now does not need a restart. Apparenlty UserGroupInformation.loginUserFromKeytabAndReturnUGI() does not set the loginUser member and UserGroupInformation.loginUserFromKeytab() does. He also suggested another change with not caching the realUser - both these changes are below: In the following code private Server(String keytabFile, String principalConf) TTransportException { ... realUgi = UserGroupInformation.loginUserFromKeytabAndReturnUGI( kerberosName, keytabFile); assert realUgi.isFromKeytab(); I had to change above lines to the lines below: UserGroupInformation.loginUserFromKeytab( kerberosName, keytabFile); realUgi = UserGroupInformation.getLoginUser(); Likewise in: public boolean process(final TProtocol inProt, final TProtocol outProt) throws TException { TTransport trans = inProt.getTransport(); ... UserGroupInformation clientUgi = UserGroupInformation.createProxyUser( authId, realUgi); I changed the above to: UserGroupInformation clientUgi = UserGroupInformation.createProxyUser( auhtId, UserGroupInformation.getLoginUser());
          Hide
          Todd Lipcon added a comment -

          Hey Pradeep. Those changes seem reasonable. I'm not personally a fan of the "login user" concept in Hadoop security - it's static state, which prevents servers which may want to use multiple principals from doing so easily (eg if running a hive server with an embedded metastore, you may need a different principal for the two different pieces). But given that there is no "renewer" thread for non-loginuser keytab logins, it may be the only choice for now.

          Show
          Todd Lipcon added a comment - Hey Pradeep. Those changes seem reasonable. I'm not personally a fan of the "login user" concept in Hadoop security - it's static state, which prevents servers which may want to use multiple principals from doing so easily (eg if running a hive server with an embedded metastore, you may need a different principal for the two different pieces). But given that there is no "renewer" thread for non-loginuser keytab logins, it may be the only choice for now.
          Hide
          Pradeep Kamath added a comment -

          I have tested this patch after making HIVE-1526 work and with the changes suggested in my earlier comment -https://issues.apache.org/jira/browse/HIVE-842?focusedCommentId=12924020&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12924020. It works well in the testing I have done. Once HIVE-1526 is regenerated, this patch can be regenerated accordingly and hopefully can be committed soon to enable working with Hadoop security

          Show
          Pradeep Kamath added a comment - I have tested this patch after making HIVE-1526 work and with the changes suggested in my earlier comment - https://issues.apache.org/jira/browse/HIVE-842?focusedCommentId=12924020&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12924020 . It works well in the testing I have done. Once HIVE-1526 is regenerated, this patch can be regenerated accordingly and hopefully can be committed soon to enable working with Hadoop security
          Hide
          Ashutosh Chauhan added a comment -

          Updated patch on latest trunk. Also, incorporates the changes discussed in previous comment. It needs to be applied on top of HIVE-1526 Fortunately, though since there are no common files between two, this applies cleanly on top of HIVE-1526 latest patch (HIVE-1526.3.patch.txt)

          Show
          Ashutosh Chauhan added a comment - Updated patch on latest trunk. Also, incorporates the changes discussed in previous comment. It needs to be applied on top of HIVE-1526 Fortunately, though since there are no common files between two, this applies cleanly on top of HIVE-1526 latest patch ( HIVE-1526 .3.patch.txt)
          Hide
          Ashutosh Chauhan added a comment -

          HIVE-1526 went in. This is ready to go in as well. Patch (hive-842_2.patch) applies cleanly. Please review.

          Show
          Ashutosh Chauhan added a comment - HIVE-1526 went in. This is ready to go in as well. Patch (hive-842_2.patch) applies cleanly. Please review.
          Hide
          John Sichi added a comment -

          Click "Submit Patch" to get it into the review queue.

          Show
          John Sichi added a comment - Click "Submit Patch" to get it into the review queue.
          Hide
          Ashutosh Chauhan added a comment -

          Lets get it in review queue.

          Show
          Ashutosh Chauhan added a comment - Lets get it in review queue.
          Hide
          He Yongqiang added a comment -

          will take a look.

          Show
          He Yongqiang added a comment - will take a look.
          Hide
          Devaraj Das added a comment -

          The security related changes look fine to me.

          Show
          Devaraj Das added a comment - The security related changes look fine to me.
          Hide
          He Yongqiang added a comment -

          Looks good to me. Is there a way to add testcase for it?

          Show
          He Yongqiang added a comment - Looks good to me. Is there a way to add testcase for it?
          Hide
          Devaraj Das added a comment -

          Thanks Yongqiang for reviewing. Writing a testcase is really difficult - it requires the kerberos setup. We have manually tested this patch at Yahoo!...

          Show
          Devaraj Das added a comment - Thanks Yongqiang for reviewing. Writing a testcase is really difficult - it requires the kerberos setup. We have manually tested this patch at Yahoo!...
          Hide
          He Yongqiang added a comment -

          Will run tests and commit.

          Show
          He Yongqiang added a comment - Will run tests and commit.
          Hide
          Ashutosh Chauhan added a comment -

          I ran the existing tests and they succeeded

          Show
          Ashutosh Chauhan added a comment - I ran the existing tests and they succeeded
          Hide
          He Yongqiang added a comment -

          Committed! Thanks Ashutosh!

          Show
          He Yongqiang added a comment - Committed! Thanks Ashutosh!

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Edward Capriolo
            • Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development