Hive
  1. Hive
  2. HIVE-78

Authorization infrastructure for Hive

    Details

    • Hadoop Flags:
      Reviewed

      Description

      Allow hive to integrate with existing user repositories for authentication and authorization infromation.

      1. createuser-v1.patch
        26 kB
        Min Zhou
      2. HIVE-78.1.nothrift.patch
        1.66 MB
        He Yongqiang
      3. HIVE-78.1.thrift.patch
        3.21 MB
        He Yongqiang
      4. HIVE-78.10.no_thrift.patch
        735 kB
        He Yongqiang
      5. HIVE-78.11.patch
        3.60 MB
        He Yongqiang
      6. HIVE-78.12.2.patch
        3.61 MB
        He Yongqiang
      7. HIVE-78.12.3.patch
        3.61 MB
        He Yongqiang
      8. HIVE-78.12.4.patch
        3.59 MB
        He Yongqiang
      9. HIVE-78.12.5.patch
        3.61 MB
        He Yongqiang
      10. HIVE-78.12.patch
        3.61 MB
        He Yongqiang
      11. HIVE-78.2.nothrift.patch
        272 kB
        He Yongqiang
      12. HIVE-78.2.thrift.patch
        3.36 MB
        He Yongqiang
      13. HIVE-78.4.complete.patch
        3.76 MB
        He Yongqiang
      14. HIVE-78.4.no_thrift.patch
        630 kB
        He Yongqiang
      15. HIVE-78.5.complete.patch
        4.15 MB
        He Yongqiang
      16. HIVE-78.5.no_thrift.patch
        651 kB
        He Yongqiang
      17. HIVE-78.6.complete.patch
        5.06 MB
        He Yongqiang
      18. HIVE-78.6.no_thrift.patch
        649 kB
        He Yongqiang
      19. HIVE-78.7.no_thrift.patch
        658 kB
        He Yongqiang
      20. HIVE-78.7.patch
        4.78 MB
        He Yongqiang
      21. HIVE-78.9.no_thrift.patch
        763 kB
        He Yongqiang
      22. HIVE-78.9.patch
        4.10 MB
        He Yongqiang
      23. hive-78.diff
        1 kB
        Edward Capriolo
      24. hive-78-metadata-v1.patch
        15 kB
        Min Zhou
      25. hive-78-syntax-v1.patch
        9 kB
        Min Zhou

        Issue Links

          Activity

          Hide
          Edward Capriolo added a comment -

          LDAP seems like a good way to handle this. We have a few alternatives.

          Any posixAccount can log into hive. LDAP search would be (&(objectClass=posixAccount (uid=<user>))

          We could enforce that the user must be have some other attribute (&(objectClass)=posixAccount (uid=<user>)(businessCategory="hiveuser"))

          We could enforce that the user must be valid and they must be inside of a specific groupOfUniqueNames
          (&(objectClass=posixAccount (uid=<user>) and memberof (hiveGroup) apache mod_ldap can do this

          We can create a supplemental schema attribute we can append to already exists ldap users.

          Show
          Edward Capriolo added a comment - LDAP seems like a good way to handle this. We have a few alternatives. Any posixAccount can log into hive. LDAP search would be (&(objectClass=posixAccount (uid=<user>)) We could enforce that the user must be have some other attribute (&(objectClass)=posixAccount (uid=<user>)(businessCategory="hiveuser")) We could enforce that the user must be valid and they must be inside of a specific groupOfUniqueNames (&(objectClass=posixAccount (uid=<user>) and memberof (hiveGroup) apache mod_ldap can do this We can create a supplemental schema attribute we can append to already exists ldap users.
          Hide
          Ashish Thusoo added a comment -

          +1 on this.

          I also wanted to integrate this with AD through kerberos as that is perhaps the dominant user repositories in most enterprises and at least internally we have some users that do not have unix accounts (mostly analysts). We could use samba to provide the bridge to AD as there are certain nuances when it comes to Kerberos with AD as well as NTLM and NTLMv2 auths that samba has already solved.

          Also we should also think of providing integration with unix accounts - those maintained in passwd db specially for folks who want to just test authentication specific features.

          In the past the most dominant directories that I have found in enterprise environments as AD (can be bridged through LDAP and Samba), Sun Java One, Novell and OID (all LDAP directories) and Unix accounts.

          Thoughts?

          Show
          Ashish Thusoo added a comment - +1 on this. I also wanted to integrate this with AD through kerberos as that is perhaps the dominant user repositories in most enterprises and at least internally we have some users that do not have unix accounts (mostly analysts). We could use samba to provide the bridge to AD as there are certain nuances when it comes to Kerberos with AD as well as NTLM and NTLMv2 auths that samba has already solved. Also we should also think of providing integration with unix accounts - those maintained in passwd db specially for folks who want to just test authentication specific features. In the past the most dominant directories that I have found in enterprise environments as AD (can be bridged through LDAP and Samba), Sun Java One, Novell and OID (all LDAP directories) and Unix accounts. Thoughts?
          Hide
          Edward Capriolo added a comment -

          I wanted to mention one more solution. JDBCRealm. This is pretty well established in tomcat. It should be easy to retrofit. It has support for roles.
          Password file is a good solution as well.

          Q. Active Directory is an LDAP at its core. What is a case that you need samba to get at data in LDAP? It seems like we should be able to support active directory and LDAP using JNDI-- http://forums.sun.com/thread.jspa?threadID=581425

          I was thinking about 'roles'. hiveuser - Can issue queries kill their own queries , hiveadmin - can kill users queries

          Show
          Edward Capriolo added a comment - I wanted to mention one more solution. JDBCRealm. This is pretty well established in tomcat. It should be easy to retrofit. It has support for roles. Password file is a good solution as well. Q. Active Directory is an LDAP at its core. What is a case that you need samba to get at data in LDAP? It seems like we should be able to support active directory and LDAP using JNDI-- http://forums.sun.com/thread.jspa?threadID=581425 I was thinking about 'roles'. hiveuser - Can issue queries kill their own queries , hiveadmin - can kill users queries
          Hide
          Ashish Thusoo added a comment -

          For Active Directory I think JNDI will work as long as we work off GSSAPI - so I think Kerb V should work with JNDI.

          However, the traditional authentication mechanisms of NTLM and NTLMv2, I think those will not work with AD as they are proprietary protocols and the only public domain implementations of those are present in Samba. They are mostly an issue for old machines and old directory installations. We may as well do JNDI for now and then
          address these later.

          Will check out JDBCRealm, I have not used those in the past.

          For query side roles we could just model those on mysql privileges. Some of the basic ones include:

          • SELECT
          • INSERT
          • ALTER TABLE
          • CREATE
          • DROP

          And on the server administration side, things like:

          • KILL SESSION(QUERY)
          • SHUTDOWN
          • STARTUP
          • VIEW SESSIONS

          are useful...

          We could role these privileges up into role objects so essentially your

          hiveuser role would become SELECT, INSERT, CREATE
          while hiveadmin would become KILL SESSION, SHUTDOWN, STARTUP, VIEW SESSIONS, DROP, ALTER + whatever is in hiveusers

          Show
          Ashish Thusoo added a comment - For Active Directory I think JNDI will work as long as we work off GSSAPI - so I think Kerb V should work with JNDI. However, the traditional authentication mechanisms of NTLM and NTLMv2, I think those will not work with AD as they are proprietary protocols and the only public domain implementations of those are present in Samba. They are mostly an issue for old machines and old directory installations. We may as well do JNDI for now and then address these later. Will check out JDBCRealm, I have not used those in the past. For query side roles we could just model those on mysql privileges. Some of the basic ones include: SELECT INSERT ALTER TABLE CREATE DROP And on the server administration side, things like: KILL SESSION(QUERY) SHUTDOWN STARTUP VIEW SESSIONS are useful... We could role these privileges up into role objects so essentially your hiveuser role would become SELECT, INSERT, CREATE while hiveadmin would become KILL SESSION, SHUTDOWN, STARTUP, VIEW SESSIONS, DROP, ALTER + whatever is in hiveusers
          Hide
          Ashish Thusoo added a comment -

          Assigning to Edward as he is going to start on this... Thanks Edward!!

          Show
          Ashish Thusoo added a comment - Assigning to Edward as he is going to start on this... Thanks Edward!!
          Hide
          Edward Capriolo added a comment -

          I would like to leverage the 'REALM' has already been done with tomcat. This would give us the ability to plug into many standard authentication architectures.
          http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/realm/package-tree.html

          If we including a jar file in a binary format from tomcat should it be part of the patch or should we fork some of the tomcat source? We should have not have to alter the original code we will be using it directly or extending it.

          Show
          Edward Capriolo added a comment - I would like to leverage the 'REALM' has already been done with tomcat. This would give us the ability to plug into many standard authentication architectures. http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/realm/package-tree.html If we including a jar file in a binary format from tomcat should it be part of the patch or should we fork some of the tomcat source? We should have not have to alter the original code we will be using it directly or extending it.
          Hide
          Ashish Thusoo added a comment -

          On the first looks Realms seems to be a nice fit for this problem.

          One capability that is missing there and which may become an issue later is the ability to compose roles into higher level roles. To me it seems that roles are strictly flat and are not hierarchical, so I cannot create an admin role that has the basic roles within it . Can this be achieved with Realms? I have not used it before so I am not sure that if it is achievable?

          The other issue that I can think of is whether Realms is generic enough to protect any kind of a resource and not just limited to web resourrces. We have tables and partitions, servers etc. Could you elaborate on how this would work for the capabilities that I listed in my previous comments.

          Show
          Ashish Thusoo added a comment - On the first looks Realms seems to be a nice fit for this problem. One capability that is missing there and which may become an issue later is the ability to compose roles into higher level roles. To me it seems that roles are strictly flat and are not hierarchical, so I cannot create an admin role that has the basic roles within it . Can this be achieved with Realms? I have not used it before so I am not sure that if it is achievable? The other issue that I can think of is whether Realms is generic enough to protect any kind of a resource and not just limited to web resourrces. We have tables and partitions, servers etc. Could you elaborate on how this would work for the capabilities that I listed in my previous comments.
          Hide
          Edward Capriolo added a comment -

          Recursive Role processing is probably not possible with JDBCRealm.

          Recursive Role processing is generally difficult to implement. N.I.S. Net Groups is an example of this, because of the recursive nature you have a more complicated implementation. Firstly, you have to check for loops in the group definition. Role1 memberOf-> Role2-> memberOf Role3-> memberOf ->Role1. This needs to be done when the rule is created, or evaluated, or both. I have found (in my experience) dynamic/recursive groups are are less practical then they originally seem. They do have merit however.

          The roles you mentioned were:

          • SELECT
          • INSERT
          • ALTER TABLE
          • CREATE
          • DROP
          • KILL SESSION(QUERY)
          • SHUTDOWN
          • STARTUP
          • VIEW SESSIONS

          IMPORTANT: Are roles global or per object? Realms really only make sense with global permissions.

          Lets look at a scenario:

          • Hive
            • tableA
            • tableB
            • tableC
          • Users
            • john
              • uid 3000
              • gid 3000,4000
            • bob
              • uid 3001
              • gid 3001,4000
          • Groups
            • john
              • gid 3000
            • bob
              • gid 3001
            • hr
              • gid 4000

          Goal to implement root has full access to all tables, john has access to table a, and bob has access to table b. tablec can be read by anyone in hr

          • Realms
            • tableA_select
              • root
              • john
            • tableA_insert
              • root
              • john
            • tableB_select
              • root
              • bob
            • tableB_insert
              • root
              • bob
            • tableC_select
              • root
              • bob
              • john

          Using '_' as a delimiter and constructing several roles per table is a slightly non standard for realms, but it would work. User lists are flat.

          About these permissions:

          • SELECT
          • INSERT
          • ALTER TABLE
          • CREATE
          • DROP

          If an external table was created. If my UID has access to the file through HDFS I would expect to have select access inside Hive. If I could not write the file in HDFS hive would not expect hive to give me these permissions. I think we should clearly define the difference between AUTHENTICATION and ACCESS.

          For example, the AUTHENTICATION information for a user is commonly stored in Active Directory. However ACCESS information like, what tables a user may run SELECT on can not be stored in Active Directory without changing the Active Directory schema.

          Realm or JAAS gives us a quick way to answer the authorization question. As to the ACCESS we either have to store that information in the meta store or an external system.

          Show
          Edward Capriolo added a comment - Recursive Role processing is probably not possible with JDBCRealm. Recursive Role processing is generally difficult to implement. N.I.S. Net Groups is an example of this, because of the recursive nature you have a more complicated implementation. Firstly, you have to check for loops in the group definition. Role1 memberOf-> Role2-> memberOf Role3-> memberOf ->Role1. This needs to be done when the rule is created, or evaluated, or both. I have found (in my experience) dynamic/recursive groups are are less practical then they originally seem. They do have merit however. The roles you mentioned were: SELECT INSERT ALTER TABLE CREATE DROP KILL SESSION(QUERY) SHUTDOWN STARTUP VIEW SESSIONS IMPORTANT: Are roles global or per object? Realms really only make sense with global permissions. Lets look at a scenario: Hive tableA tableB tableC Users john uid 3000 gid 3000,4000 bob uid 3001 gid 3001,4000 Groups john gid 3000 bob gid 3001 hr gid 4000 Goal to implement root has full access to all tables, john has access to table a, and bob has access to table b. tablec can be read by anyone in hr Realms tableA_select root john tableA_insert root john tableB_select root bob tableB_insert root bob tableC_select root bob john Using '_' as a delimiter and constructing several roles per table is a slightly non standard for realms, but it would work. User lists are flat. About these permissions: SELECT INSERT ALTER TABLE CREATE DROP If an external table was created. If my UID has access to the file through HDFS I would expect to have select access inside Hive. If I could not write the file in HDFS hive would not expect hive to give me these permissions. I think we should clearly define the difference between AUTHENTICATION and ACCESS. For example, the AUTHENTICATION information for a user is commonly stored in Active Directory. However ACCESS information like, what tables a user may run SELECT on can not be stored in Active Directory without changing the Active Directory schema. Realm or JAAS gives us a quick way to answer the authorization question. As to the ACCESS we either have to store that information in the meta store or an external system.
          Hide
          Ashish Thusoo added a comment -

          The roles are actually per object. I would say that these are atleast per table, if not per partition. I don't have a use case for the later but seperation on the basis of table is actually very very desirable.

          Given that, and the fact that currently we have around 5000 tables in our warehouse, do you have some idea of how realms with scale with such a large number of objects.

          I agree that a generic recursive role infrastructure does not have a lot of utility, but considering that we have so many permissions, I would think that it would be quite cumbersome for an administrator to enumerate all of them for every user that is created (though some good defaults can surely alleviate some of the concerns here). So I think being able to package permissions into some higher level roles would help. Note that we do not need a generic role within a role, but it would be nice to have a role be a set of permissions on certain objects and an ability to allow authorization framework to be able to associate a role or permission with a user.

          The other way to do this is to define groups which can be assigned a set of permissions and a set of users. That level of indirection would also work in reducing the number of user to permission assignments that we would have to make otherwise.

          I agree that authentication and authorization (much of what I have been talking about in this comment), need to be separated out and while we use the directory infrastructure for authentication, we should store the authorization information in the metastore as that is specific to our application and no sane directory administrator would allow us to touch the directory to support custom attributes.

          If we do that separation, then Realms perhaps can take care of just the authentication portion, and once the user is authenticated, the authorization infrastructure looks up the user by ID in metastore to figure out what capabilities the user has.

          Is that what you have in mind?

          In this scenario, I presume that we would have a realm for AD and just have all the users authenticate with that realm. So the number of realms would be a function of the number of directories or user repositories as opposed to being a function of the number of objects.

          Show
          Ashish Thusoo added a comment - The roles are actually per object. I would say that these are atleast per table, if not per partition. I don't have a use case for the later but seperation on the basis of table is actually very very desirable. Given that, and the fact that currently we have around 5000 tables in our warehouse, do you have some idea of how realms with scale with such a large number of objects. I agree that a generic recursive role infrastructure does not have a lot of utility, but considering that we have so many permissions, I would think that it would be quite cumbersome for an administrator to enumerate all of them for every user that is created (though some good defaults can surely alleviate some of the concerns here). So I think being able to package permissions into some higher level roles would help. Note that we do not need a generic role within a role, but it would be nice to have a role be a set of permissions on certain objects and an ability to allow authorization framework to be able to associate a role or permission with a user. The other way to do this is to define groups which can be assigned a set of permissions and a set of users. That level of indirection would also work in reducing the number of user to permission assignments that we would have to make otherwise. I agree that authentication and authorization (much of what I have been talking about in this comment), need to be separated out and while we use the directory infrastructure for authentication, we should store the authorization information in the metastore as that is specific to our application and no sane directory administrator would allow us to touch the directory to support custom attributes. If we do that separation, then Realms perhaps can take care of just the authentication portion, and once the user is authenticated, the authorization infrastructure looks up the user by ID in metastore to figure out what capabilities the user has. Is that what you have in mind? In this scenario, I presume that we would have a realm for AD and just have all the users authenticate with that realm. So the number of realms would be a function of the number of directories or user repositories as opposed to being a function of the number of objects.
          Hide
          Edward Capriolo added a comment -

          We also have to look at this on the file system level. For example, files in my warehouse are owned by the user who created the table.

          /user/hive/warehouse/edward <dir> 2008-10-30 17:13 rwxr-xr-x edward supergroup

          Regardless of what permissions are granted in the metastore (via this jira), hadoop ACL governs what a user can do to that file.

          This is not an issue in mysql. In a typical mysql deployment all of the data files are owned by a mysql user.

          I do not see a clear cut solution for this.

          In one scenario we make sure all the files in the warehouse are owned RW to all, or owned by a specific user. A component like HiveServer, CLI, or HWI would decide if the user action would succeed based on the meta data.

          The other option is that an operation like 'GRANT SELECT' would have to physically modify the Hadoop ACL/owner. This method will not help us get the fine grained control we desire.

          Show
          Edward Capriolo added a comment - We also have to look at this on the file system level. For example, files in my warehouse are owned by the user who created the table. /user/hive/warehouse/edward <dir> 2008-10-30 17:13 rwxr-xr-x edward supergroup Regardless of what permissions are granted in the metastore (via this jira), hadoop ACL governs what a user can do to that file. This is not an issue in mysql. In a typical mysql deployment all of the data files are owned by a mysql user. I do not see a clear cut solution for this. In one scenario we make sure all the files in the warehouse are owned RW to all, or owned by a specific user. A component like HiveServer, CLI, or HWI would decide if the user action would succeed based on the meta data. The other option is that an operation like 'GRANT SELECT' would have to physically modify the Hadoop ACL/owner. This method will not help us get the fine grained control we desire.
          Hide
          Min Zhou added a comment -

          Is there any further progess on this issue?

          Show
          Min Zhou added a comment - Is there any further progess on this issue?
          Hide
          Edward Capriolo added a comment -

          My last comment is a blocker in my mind. How can we implement complex access controls at the Hive level when we have basic file ownership issues at the file level? Daemons like HiveService and HiveWebInterface will have to run as supergroup or a hive group? How is this this effect the CLI that will run as the individual user?

          These are not as much Hive issues as they are environment/setup issues, but I do not want to assume my environment is the target environment. Will we be assuming users are members of a 'hive' posix group or that all the files in the warehouse are owned by user 'Hive' group 'Hive'? I wanted to get others opinion on this.

          Show
          Edward Capriolo added a comment - My last comment is a blocker in my mind. How can we implement complex access controls at the Hive level when we have basic file ownership issues at the file level? Daemons like HiveService and HiveWebInterface will have to run as supergroup or a hive group? How is this this effect the CLI that will run as the individual user? These are not as much Hive issues as they are environment/setup issues, but I do not want to assume my environment is the target environment. Will we be assuming users are members of a 'hive' posix group or that all the files in the warehouse are owned by user 'Hive' group 'Hive'? I wanted to get others opinion on this.
          Hide
          Edward Capriolo added a comment -

          GRANT

          • SELECT
          • ALTER
          • INSERT
          • UPDATE --RESERVED
          • DROP
          • CREATE

          GLOBAL GRANT PERMISSIONS

          • PROCESS_LIST -List Query
          • PROCESS_KILL -Kill query
          • RC - start shutdown
          • WITH_GRANT - Give user permission to grant other permissions

          SPECIAL

          • 'ALL' ALL PERMISSIONS

          Target Objects: ALL, DataBase, Table, Partition, Column

          • Permissions are additive
          • Upper level implies lower level i.e. select on table implies select on all columns in table

          Suggested Syntax

          • GRANT WITH_GRANT,RC, ON '*' TO 'USER1','USER2' AS my_permission
          • GRANT SELECT ON 'cat1','cat2' TO 'USER1' AS my_permission
          • GRANT SELECT ON 'cat1.*', 'cat2.homes.name' TO 'USER4', '%GROUP1' AS my_permission
          • GRANT SELECT on 'cat1.*', 'cat2.homes.PARTITION="5.5.4".owner' TO 'USER5' AS my_permission

          In the metastore we can store the permissions like this:
          PERMISSION SET

          { Vector <User|GROUP> , Vector <TargetObject>, Vector <PRIV>, String Name }
          Show
          Edward Capriolo added a comment - GRANT SELECT ALTER INSERT UPDATE --RESERVED DROP CREATE GLOBAL GRANT PERMISSIONS PROCESS_LIST -List Query PROCESS_KILL -Kill query RC - start shutdown WITH_GRANT - Give user permission to grant other permissions SPECIAL 'ALL' ALL PERMISSIONS Target Objects: ALL, DataBase, Table, Partition, Column Permissions are additive Upper level implies lower level i.e. select on table implies select on all columns in table Suggested Syntax GRANT WITH_GRANT,RC, ON '*' TO 'USER1','USER2' AS my_permission GRANT SELECT ON 'cat1','cat2' TO 'USER1' AS my_permission GRANT SELECT ON 'cat1.*', 'cat2.homes.name' TO 'USER4', '%GROUP1' AS my_permission GRANT SELECT on 'cat1.*', 'cat2.homes.PARTITION="5.5.4".owner' TO 'USER5' AS my_permission In the metastore we can store the permissions like this: PERMISSION SET { Vector <User|GROUP> , Vector <TargetObject>, Vector <PRIV>, String Name }
          Hide
          Prasad Chakka added a comment -

          This is great. I have few questions..
          1) What would be the syntax to create user/passwd combos and logging in?
          2) Are the permissions stored in metastore are per user or per table or a combo?
          3) Do we really need groups? I don't think MySQL implements groups.
          4) I am totally naive in authentication systems, but I am assuming only access details are stored in metastore and authentication is done by one of the systems discussed. is that correct?

          Show
          Prasad Chakka added a comment - This is great. I have few questions.. 1) What would be the syntax to create user/passwd combos and logging in? 2) Are the permissions stored in metastore are per user or per table or a combo? 3) Do we really need groups? I don't think MySQL implements groups. 4) I am totally naive in authentication systems, but I am assuming only access details are stored in metastore and authentication is done by one of the systems discussed. is that correct?
          Hide
          Edward Capriolo added a comment -

          >> 1) What would be the syntax to create user/passwd combos and logging in?
          username and password would come externally. I notice a hadoop Jira on authenticate via Kerb4 and LDAP. We are best off splitting the authentication and authorization as we spoke of above. user and group are your external posix groups

          >> 2) Are the permissions stored in metastore are per user or per table or a combo?
          They should be stored in the metastore. a rule like GRANT * on '' TO '' AS my_permission would have to be stored everywhere and that would be a PITA.

          >> 3) Do we really need groups? I don't think MySQL implements groups
          The group is your posix login group. Allowing groups is a simple way to reduce the number of per user rules.

          >> 4)
          Right again. The separation here is we let the authentication system carry all the burden of username, groups and password. The metastore is only concerned with what that user can do inside hive.

          Show
          Edward Capriolo added a comment - >> 1) What would be the syntax to create user/passwd combos and logging in? username and password would come externally. I notice a hadoop Jira on authenticate via Kerb4 and LDAP. We are best off splitting the authentication and authorization as we spoke of above. user and group are your external posix groups >> 2) Are the permissions stored in metastore are per user or per table or a combo? They should be stored in the metastore. a rule like GRANT * on ' ' TO ' ' AS my_permission would have to be stored everywhere and that would be a PITA. >> 3) Do we really need groups? I don't think MySQL implements groups The group is your posix login group. Allowing groups is a simple way to reduce the number of per user rules. >> 4) Right again. The separation here is we let the authentication system carry all the burden of username, groups and password. The metastore is only concerned with what that user can do inside hive.
          Hide
          Ashish Thusoo added a comment -

          I agree, it is best to punt authentication to the authentication systems (LDAP, kerb etc. etc.) and concentrate on authorization (privileges) here.

          About the syntax:

          1. I am not sure what AS is used for.
          2. column level permissions are good but they can perhaps be addressed with views and treating permissions on views as we do for tables.
          3. I would add the key word TABLE in the GRANT statement, like mysql because we may have permissions on User defined functions and types in future... so something like..
          GRANT SELECT ON TABLE 'cat1' TO 'USER1'
          4. Also maybe in the TO clause make the user and group explict - TO USERS a, b, c GROUPS g1, g2 otherwise the reader of the command may not know what is a group and what is a user. I presume this would also make the authorization logic somewhat simpler as you would know exactly what to look for?

          About the blocker that you mentioned, we should perhaps let the hadoop file permissions be independent of Hive ACLs. Of course you need both to be able to do anything on the table. Can be tricky though.. Will spend a bit more time thinking about this - this looks pretty cool...

          Show
          Ashish Thusoo added a comment - I agree, it is best to punt authentication to the authentication systems (LDAP, kerb etc. etc.) and concentrate on authorization (privileges) here. About the syntax: 1. I am not sure what AS is used for. 2. column level permissions are good but they can perhaps be addressed with views and treating permissions on views as we do for tables. 3. I would add the key word TABLE in the GRANT statement, like mysql because we may have permissions on User defined functions and types in future... so something like.. GRANT SELECT ON TABLE 'cat1' TO 'USER1' 4. Also maybe in the TO clause make the user and group explict - TO USERS a, b, c GROUPS g1, g2 otherwise the reader of the command may not know what is a group and what is a user. I presume this would also make the authorization logic somewhat simpler as you would know exactly what to look for? About the blocker that you mentioned, we should perhaps let the hadoop file permissions be independent of Hive ACLs. Of course you need both to be able to do anything on the table. Can be tricky though.. Will spend a bit more time thinking about this - this looks pretty cool...
          Hide
          Edward Capriolo added a comment -

          All those points make sense.

          >>1. I am not sure what AS is used for.
          I am thinking AS is the way to name the PermissionSet. Imagine a rule like this:

          GRANT WITH_GRANT,RC, ON '*' TO 'USER1','USER2' AS my_permission
          

          At some point 'USER3' might become an administrator. It would be nice to issue a command like:

          ALTER GRANT my_permission add USER 'USER3'
          

          It also makes the grant self documenting.

          Show
          Edward Capriolo added a comment - All those points make sense. >>1. I am not sure what AS is used for. I am thinking AS is the way to name the PermissionSet. Imagine a rule like this: GRANT WITH_GRANT,RC, ON '*' TO 'USER1','USER2' AS my_permission At some point 'USER3' might become an administrator. It would be nice to issue a command like: ALTER GRANT my_permission add USER 'USER3' It also makes the grant self documenting.
          Hide
          Edward Capriolo added a comment -

          The metastore would be a good place to start the ball rolling. Any comments?

          Show
          Edward Capriolo added a comment - The metastore would be a good place to start the ball rolling. Any comments?
          Hide
          Min Zhou added a comment -

          we will take over this issue, it would be finished in two weeks. Here are the sql statements will be added:

          CREATE USER, 
          DROP USER;
          ALTER USER SET PASSOWRD;
          GRANT;
          REVOKE
          

          Metadata is stored at some sort of persistent media such as mysql DBMS through jdo. We will add three tables for this issue, they are USER, DBS_PRIV, TABLES_PRIV. Privileges can be granted at several levels, each table above are corresponding to a privilege level.

          1. Global level
            Global privileges apply to all databases on a given server. These privileges are stored in the USER table. GRANT ALL ON . and REVOKE ALL ON . grant and revoke only global privileges.
            GRANT ALL ON . TO 'someuser';
            GRANT SELECT, INSERT ON . TO 'someuser';
          1. Database level
            Database privileges apply to all objects in a given database. These privileges are stored in the DBS_PRIV table. GRANT ALL ON db_name.* and REVOKE ALL ON db_name.* grant and revoke only database privileges.
            GRANT ALL ON mydb.* TO 'someuser';
            GRANT SELECT, INSERT ON mydb.* TO 'someuser';
            Although we can't create DBs currently, it would take a reserved place till hive support.
          1. Table level
            Table privileges apply to all columns in a given table. These privileges are stored in the TABLES_PRIV table. GRANT ALL ON db_name.tbl_name and REVOKE ALL ON db_name.tbl_name grant and revoke only table privileges.
            GRANT ALL ON mydb.mytbl TO 'someuser';
            GRANT SELECT, INSERT ON mydb.mytbl TO 'someuser';

          Hive account information is stored in USER table, includes username, password and kinds of privileges. User who has been granted any privilege to, such as select/insert/drop on a particular table, always have a right to show that table.

          Show
          Min Zhou added a comment - we will take over this issue, it would be finished in two weeks. Here are the sql statements will be added: CREATE USER, DROP USER; ALTER USER SET PASSOWRD; GRANT; REVOKE Metadata is stored at some sort of persistent media such as mysql DBMS through jdo. We will add three tables for this issue, they are USER, DBS_PRIV, TABLES_PRIV. Privileges can be granted at several levels, each table above are corresponding to a privilege level. Global level Global privileges apply to all databases on a given server. These privileges are stored in the USER table. GRANT ALL ON . and REVOKE ALL ON . grant and revoke only global privileges. GRANT ALL ON . TO 'someuser'; GRANT SELECT, INSERT ON . TO 'someuser'; Database level Database privileges apply to all objects in a given database. These privileges are stored in the DBS_PRIV table. GRANT ALL ON db_name.* and REVOKE ALL ON db_name.* grant and revoke only database privileges. GRANT ALL ON mydb.* TO 'someuser'; GRANT SELECT, INSERT ON mydb.* TO 'someuser'; Although we can't create DBs currently, it would take a reserved place till hive support. Table level Table privileges apply to all columns in a given table. These privileges are stored in the TABLES_PRIV table. GRANT ALL ON db_name.tbl_name and REVOKE ALL ON db_name.tbl_name grant and revoke only table privileges. GRANT ALL ON mydb.mytbl TO 'someuser'; GRANT SELECT, INSERT ON mydb.mytbl TO 'someuser'; Hive account information is stored in USER table, includes username, password and kinds of privileges. User who has been granted any privilege to, such as select/insert/drop on a particular table, always have a right to show that table.
          Hide
          Min Zhou added a comment -

          We currently use seperated mysql dbs for achieving an isolated CLI environment, which is not practical. An authentication infrastructure is urgently needed for us.

          Almost all statements would be influenced, for example
          SELECT
          INSERT
          SHOW TABLES
          SHOW PARTITIONS
          DESCRIBE TABLE
          MSCK
          CREATE TABLE
          CREATE FUNCTION – we are considering how to control people creating udfs.
          DROP TABLE
          DROP FUNCTION
          LOAD
          added with GRANT/REVOKE themselft, and CREATE USER/DROP USER/SET PASSWORD. Even includes some non-sql commands like set , add file ,add jar.

          Show
          Min Zhou added a comment - We currently use seperated mysql dbs for achieving an isolated CLI environment, which is not practical. An authentication infrastructure is urgently needed for us. Almost all statements would be influenced, for example SELECT INSERT SHOW TABLES SHOW PARTITIONS DESCRIBE TABLE MSCK CREATE TABLE CREATE FUNCTION – we are considering how to control people creating udfs. DROP TABLE DROP FUNCTION LOAD added with GRANT/REVOKE themselft, and CREATE USER/DROP USER/SET PASSWORD. Even includes some non-sql commands like set , add file ,add jar.
          Hide
          Edward Capriolo added a comment -

          Min,

          First, let me say you have probably come along much farther then me on this issue.

          Your approach is too strong. Hive is an open-community process. Through it is not very detailed we have loosely agreed on a spec (above), in that spec we have decided not to store username/password information in hive. Rather upstream is still going to be responsible for this information. We also agreed on syntax.

          You should not throw up a new spec, and some code, and say something along the lines of "We are going to take over and do it this way". Imagine if each jira issue you working on you were 20% to 50% done. And then someone jumped in and said "I already finished it a different way", that would be rather annoying. It would be a "first patch wins" system.

          First, before you are going to write a line of code you should let someone know your intention to work on it. Otherwise what is the point of having two people work on something where one version gets thrown away? It is a waste, and this would be the second issue this has happened to me.

          Second even if you want to starting coding it up it has to be what people agreed on. We agreed not to store user/pass (hadoop will be doing this upstream soon), and we agreed on syntax, if you want to reopen that issue you should discuss it before coding it. It has to be good for the community, not just your deployment.

          So where do we go from here? Do we go back to the design phase and describe all the syntax we want to support?

          Show
          Edward Capriolo added a comment - Min, First, let me say you have probably come along much farther then me on this issue. Your approach is too strong. Hive is an open-community process. Through it is not very detailed we have loosely agreed on a spec (above), in that spec we have decided not to store username/password information in hive. Rather upstream is still going to be responsible for this information. We also agreed on syntax. You should not throw up a new spec, and some code, and say something along the lines of "We are going to take over and do it this way". Imagine if each jira issue you working on you were 20% to 50% done. And then someone jumped in and said "I already finished it a different way", that would be rather annoying. It would be a "first patch wins" system. First, before you are going to write a line of code you should let someone know your intention to work on it. Otherwise what is the point of having two people work on something where one version gets thrown away? It is a waste, and this would be the second issue this has happened to me. Second even if you want to starting coding it up it has to be what people agreed on. We agreed not to store user/pass (hadoop will be doing this upstream soon), and we agreed on syntax, if you want to reopen that issue you should discuss it before coding it. It has to be good for the community, not just your deployment. So where do we go from here? Do we go back to the design phase and describe all the syntax we want to support?
          Hide
          Min Zhou added a comment -

          @Edward

          Sorry for my abuse of some words, I hope this will not affect our work.

          Can you give me the jiras you decided not to store username/password information in hive and hadoop will?
          I think most companies are using hadoop versions from 0.17 to 0.20 , which don't have good password securities. Once a company takes a particular version, upgrades for them is a very important issue, many companies will adopt a more stable version. Moreover, now hadoop still do not have that feature, which may cost a very long time to implement. Why should we are waiting for, rather than accomplish it? I think Hive is necessary to support user/password at least for current versions of hadoop. There are many companies who are using hive reflected that current hive is inconvenient for multi-user, as long as environment isolation, table sharing, security, etc. We must try to meet the requirements of most of them.

          Regarding the syntax, I guess we can do it in two steps.

          1. support GRANT/REVOKE privileges to users.
          2. support some sort of server administration privileges as Ashish metioned.
            The GRANT statement enables system administrators to create Hive user accounts and to grant rights to accounts. To use GRANT, you must have the GRANT OPTION privilege, and you must have the privileges that you are grantingad. The REVOKE statement is related and enables ministrators to remove account privileges.

          File hive-78-syntax-v1.patch modifies the syntax. Any comments on that?

          Show
          Min Zhou added a comment - @Edward Sorry for my abuse of some words, I hope this will not affect our work. Can you give me the jiras you decided not to store username/password information in hive and hadoop will? I think most companies are using hadoop versions from 0.17 to 0.20 , which don't have good password securities. Once a company takes a particular version, upgrades for them is a very important issue, many companies will adopt a more stable version. Moreover, now hadoop still do not have that feature, which may cost a very long time to implement. Why should we are waiting for, rather than accomplish it? I think Hive is necessary to support user/password at least for current versions of hadoop. There are many companies who are using hive reflected that current hive is inconvenient for multi-user, as long as environment isolation, table sharing, security, etc. We must try to meet the requirements of most of them. Regarding the syntax, I guess we can do it in two steps. support GRANT/REVOKE privileges to users. support some sort of server administration privileges as Ashish metioned. The GRANT statement enables system administrators to create Hive user accounts and to grant rights to accounts. To use GRANT, you must have the GRANT OPTION privilege, and you must have the privileges that you are grantingad. The REVOKE statement is related and enables ministrators to remove account privileges. File hive-78-syntax-v1.patch modifies the syntax. Any comments on that?
          Hide
          Namit Jain added a comment -

          I think, we should spend some time on finalizing the functionality before implementing it - it is very difficult to change something once it is out, due to all kinds of backward compatibility issues.

          For the syntax, AS

          wont it be simpler to add permissions to a role, and then assign roles to a user.

          GRANT WITH_GRANT,RC, ON '*' TO 'USER1','USER2' AS my_permission

          ALTER GRANT my_permission add USER 'USER3'

          Can I revoke some privileges from my_permissions ?

          If yes, how is it different from doing the two things differently ?

          CREATE ROLE my_permission AS GRANT WITH_GRANT,RC, ON '*' ;
          GRANT my_permission to USER1, USER2;

          later

          GRANT my_permission to USER3;

          Show
          Namit Jain added a comment - I think, we should spend some time on finalizing the functionality before implementing it - it is very difficult to change something once it is out, due to all kinds of backward compatibility issues. For the syntax, AS wont it be simpler to add permissions to a role, and then assign roles to a user. GRANT WITH_GRANT,RC, ON '*' TO 'USER1','USER2' AS my_permission ALTER GRANT my_permission add USER 'USER3' Can I revoke some privileges from my_permissions ? If yes, how is it different from doing the two things differently ? CREATE ROLE my_permission AS GRANT WITH_GRANT,RC, ON '*' ; GRANT my_permission to USER1, USER2; later GRANT my_permission to USER3;
          Hide
          Edward Capriolo added a comment -

          @namit,

          I think, I can explain why AS made sense at the time. My plan was not to decouple users from a rule. See my little patch.

          +struct AccessControl {
          +  1: list<string>	user,
          +  2: list<string>	group,
          +  3: list<string>	database,
          +  4: list<string>	table,
          +  5: list<string>	partition,
          +  6: list<string>	column,
          +  7: list<string>	priv,
          +  8: string		name
          +}
          

          I wanted to be more or less immutable or support really simple syntax.

          Something like this is doable

          GRANT my_permission to USER3;
          

          But it seems to imply that users are decoupled from the rule.
          This is really not true (in my design) a user or group is just another multivalued attribute of the rule.

          I would like the format to be inter-changable

          ALTER my_permission add db 'db';
          ALTER my_permission add table 'db.table';
          ALTER my_permission drop table 'db.table';
          

          @Min,
          Above in this Jira see Ashish's comment..

          I agree, it is best to punt authentication to the authentication systems (LDAP, kerb etc. etc.) and concentrate on authorization (privileges) here. 
          

          The goal here is to trust the User/group information as hadoop does, and create a system that grants/revokes privileges. Authentication and Authorization are two separate things so our Jira is misnamed

          I will review your patch, just to see what you came up with. As I said, you are farther along then I am, and this has been off my radar so I don't mind passing the baton, but Namit is right we have to agree on the syntax because and what we are controlling because down the road it will be an issue.

          Show
          Edward Capriolo added a comment - @namit, I think, I can explain why AS made sense at the time. My plan was not to decouple users from a rule. See my little patch. +struct AccessControl { + 1: list<string> user, + 2: list<string> group, + 3: list<string> database, + 4: list<string> table, + 5: list<string> partition, + 6: list<string> column, + 7: list<string> priv, + 8: string name +} I wanted to be more or less immutable or support really simple syntax. Something like this is doable GRANT my_permission to USER3; But it seems to imply that users are decoupled from the rule. This is really not true (in my design) a user or group is just another multivalued attribute of the rule. I would like the format to be inter-changable ALTER my_permission add db 'db'; ALTER my_permission add table 'db.table'; ALTER my_permission drop table 'db.table'; @Min, Above in this Jira see Ashish's comment.. I agree, it is best to punt authentication to the authentication systems (LDAP, kerb etc. etc.) and concentrate on authorization (privileges) here. The goal here is to trust the User/group information as hadoop does, and create a system that grants/revokes privileges. Authentication and Authorization are two separate things so our Jira is misnamed I will review your patch, just to see what you came up with. As I said, you are farther along then I am, and this has been off my radar so I don't mind passing the baton, but Namit is right we have to agree on the syntax because and what we are controlling because down the road it will be an issue.
          Hide
          Ashish Thusoo added a comment -

          @Min

          I agree with Edwards thought here. We have to foster a collaborative environment and not be dismissive of each others ideas and approaches. Much of the work in the community happens on a volunteer basis and whatever time anyone puts on the project is a bonus and should be respected by all.

          It does make sense to keep authentication separate from authorization because in most environments there are already directories which deal with the former. Creating yet another store for passwords just leads to an administration nightmare as the account administrators have to create accounts for new users at multiple places. So lets just focus on authorization and let the directory infrastructure deal with authentication. Will look at your patch as well.

          Show
          Ashish Thusoo added a comment - @Min I agree with Edwards thought here. We have to foster a collaborative environment and not be dismissive of each others ideas and approaches. Much of the work in the community happens on a volunteer basis and whatever time anyone puts on the project is a bonus and should be respected by all. It does make sense to keep authentication separate from authorization because in most environments there are already directories which deal with the former. Creating yet another store for passwords just leads to an administration nightmare as the account administrators have to create accounts for new users at multiple places. So lets just focus on authorization and let the directory infrastructure deal with authentication. Will look at your patch as well.
          Hide
          Min Zhou added a comment -

          Let me guess, you are all talking about CLI. But we are using HiveServer as a multi-user server, not just support only one user like mysqld does.

          Show
          Min Zhou added a comment - Let me guess, you are all talking about CLI. But we are using HiveServer as a multi-user server, not just support only one user like mysqld does.
          Hide
          Edward Capriolo added a comment -

          @Min

          I would think the code should apply to any client cli, hive server, or HWI.

          We should probably also provide a configuration variable

          <property>
             <name>hive.authorize</name>
             <value>true</value>
          </property>
          
          Show
          Edward Capriolo added a comment - @Min I would think the code should apply to any client cli, hive server, or HWI. We should probably also provide a configuration variable <property> <name>hive.authorize</name> <value>true</value> </property>
          Hide
          Min Zhou added a comment -

          I do not think the HiveServer in your mind is the same as mine, which support multiple users, not only one.

          Show
          Min Zhou added a comment - I do not think the HiveServer in your mind is the same as mine, which support multiple users, not only one.
          Hide
          Min Zhou added a comment -

          From the words you commented:

          Daemons like HiveService and HiveWebInterface will have to run as supergroup or a hive group? 
          
          Show
          Min Zhou added a comment - From the words you commented: Daemons like HiveService and HiveWebInterface will have to run as supergroup or a hive group?
          Hide
          Edward Capriolo added a comment -

          @Min
          (this may be somewhat mistated but) Hadoop-Core gets the user/group information for a posix user by running shell commands like,
          WHOAMI, GROUPS, ID, etc. The hive CLI will inherit this information as does HiveServer, HWI.

          The hive web interface starts as the user sho ran the start script. The first screen on the web interface is a defacto log-in screen. This allows the user to enter their user and group information in text boxes.

          When HWI starts the session on behalf of the user it runs "SET hadoop.ugi=

          {what user entered in the test box}

          " at that point if the user initiates a hive job, the output of that job should be files owned by that user. I am pretty sure the code in QL just chown's the files at job end or perhaps the entire job runs as that user (I cant remember).

          My comment above is just referencing the fact that in some cases Hadoop ACL and our Hive authorization rules would conflict. IE
          If the files were owned by mzhou. "saying grant delete to * user edward" would not give me privileges to drop files you owned. In that case sections of the HiveServer would have to run as superuser to elevate privileges, but we punted on that issue too. (We are like a football team with bad offense. always punting)

          (If we were going to tackle password we could do it in this way)
          I would think if we wanted to enforce strong user/password authentication we could do this

          <property>
             <name>hive.password.insession</true>
            <value>hive_password</value>
            <description>empty for no password checking, if defined this is the session variable to look for password"</descripton>
          </property>
          

          In this way QL would read this value and would not execute any task for the user unless they had run "set hive_password=XYXYXYY"

          Does that make sense? Session already holds the user. It could hold the password as well. Do you see anything wrong with that approach?

          I will trim down some of the stuff I have and get upload it for reference

          Show
          Edward Capriolo added a comment - @Min (this may be somewhat mistated but) Hadoop-Core gets the user/group information for a posix user by running shell commands like, WHOAMI, GROUPS, ID, etc. The hive CLI will inherit this information as does HiveServer, HWI. The hive web interface starts as the user sho ran the start script. The first screen on the web interface is a defacto log-in screen. This allows the user to enter their user and group information in text boxes. When HWI starts the session on behalf of the user it runs "SET hadoop.ugi= {what user entered in the test box} " at that point if the user initiates a hive job, the output of that job should be files owned by that user. I am pretty sure the code in QL just chown's the files at job end or perhaps the entire job runs as that user (I cant remember). My comment above is just referencing the fact that in some cases Hadoop ACL and our Hive authorization rules would conflict. IE If the files were owned by mzhou. "saying grant delete to * user edward" would not give me privileges to drop files you owned. In that case sections of the HiveServer would have to run as superuser to elevate privileges, but we punted on that issue too. (We are like a football team with bad offense. always punting) (If we were going to tackle password we could do it in this way) I would think if we wanted to enforce strong user/password authentication we could do this <property> <name>hive.password.insession</true> <value>hive_password</value> <description>empty for no password checking, if defined this is the session variable to look for password"</descripton> </property> In this way QL would read this value and would not execute any task for the user unless they had run "set hive_password=XYXYXYY" Does that make sense? Session already holds the user. It could hold the password as well. Do you see anything wrong with that approach? I will trim down some of the stuff I have and get upload it for reference
          Hide
          Namit Jain added a comment -

          coping a earlier comment from the jira:

          I agree that authentication and authorization (much of what I have been talking about in this comment), need to be separated out and while we use the directory infrastructure for authentication, we should store the authorization information in the metastore as that is specific to our application and no sane directory administrator would allow us to touch the directory to support custom attributes.

          I agree with the above - it might be a good idea to not do password handling in hive in the first step - we can add it later if need be. Let us assume that the user has already been authenticated by some external entity,
          and proceed from there. What do you think ?

          Show
          Namit Jain added a comment - coping a earlier comment from the jira: I agree that authentication and authorization (much of what I have been talking about in this comment), need to be separated out and while we use the directory infrastructure for authentication, we should store the authorization information in the metastore as that is specific to our application and no sane directory administrator would allow us to touch the directory to support custom attributes. I agree with the above - it might be a good idea to not do password handling in hive in the first step - we can add it later if need be. Let us assume that the user has already been authenticated by some external entity, and proceed from there. What do you think ?
          Hide
          Edward Capriolo added a comment -

          @namit,

          Yes, I agree/agreed. I was off topic there, describing how we could do it if we wanted to. I will open a separate Jira for that.

          Upcoming at Hadoop World NYC someone is going to present the new authentication code in Hadoop, I would like to watch that then we(I) might better understand what the long term strategy is for Hadoop. I will split off authentication and authorization into two separate Jira to avoid confusion.

          Show
          Edward Capriolo added a comment - @namit, Yes, I agree/agreed. I was off topic there, describing how we could do it if we wanted to. I will open a separate Jira for that. Upcoming at Hadoop World NYC someone is going to present the new authentication code in Hadoop, I would like to watch that then we(I) might better understand what the long term strategy is for Hadoop. I will split off authentication and authorization into two separate Jira to avoid confusion.
          Hide
          Edward Capriolo added a comment -

          This deals with authorization not authentication

          Show
          Edward Capriolo added a comment - This deals with authorization not authentication
          Hide
          Edward Capriolo added a comment -

          Fit authorization and authentication together

          Show
          Edward Capriolo added a comment - Fit authorization and authentication together
          Hide
          Min Zhou added a comment -

          well, I've written a another Authorizer like your Authenticator yesterday

          public enum Privilege {
             SELECT_PRIV,
             INSERT_PRIV,
             CREATE_PRIV,
             ALTER_PRIV,
             DROP_PRIV,
             CREATE_USER_PRIV,
             GRANT_PRIV,
             SUPER_PRIV
          }
          public interface Authenticator {
            public boolean authenticate(Privilege priv);
            public boolean authenticate(Privilege priv, Table table);
            public boolean authenticate(Privilege priv,  List<Table> table);
          }
          
          public class GenericAuthenticator {
            public GenericAuthenticator (Hive db, User user);
             ...
          }
          

          and added a Authenticator instance info thread local SessionState.

          Show
          Min Zhou added a comment - well, I've written a another Authorizer like your Authenticator yesterday public enum Privilege { SELECT_PRIV, INSERT_PRIV, CREATE_PRIV, ALTER_PRIV, DROP_PRIV, CREATE_USER_PRIV, GRANT_PRIV, SUPER_PRIV } public interface Authenticator { public boolean authenticate(Privilege priv); public boolean authenticate(Privilege priv, Table table); public boolean authenticate(Privilege priv, List<Table> table); } public class GenericAuthenticator { public GenericAuthenticator (Hive db, User user); ... } and added a Authenticator instance info thread local SessionState.
          Hide
          Min Zhou added a comment -

          sorry,

          {nofromat}
          public class GenericAuthenticator extends Authenticator { public GenericAuthenticator (Hive db, User user); ... }{nofromat}
          Show
          Min Zhou added a comment - sorry, {nofromat} public class GenericAuthenticator extends Authenticator { public GenericAuthenticator (Hive db, User user); ... }{nofromat}
          Hide
          Edward Capriolo added a comment -

          @Min,

          I think you are on the right track. I think you might have your terminology mixed up. In AAA
          The first A is authentication which usually implies supply a user/password.
          second A authorize means what privileges the user has
          third A is accounting ( we already have that)

          The interfaces you supplied above looks like an Authorizer.... not Authenticator. I think

          public interface Authorizer {
            public boolean authorize(Privilege priv);
            public boolean authorize(Privilege priv, Table table);
            public boolean authorize(Privilege priv,  List<Table> table);
          }
          

          But you seem to be on a role. I will hang back and wait to see what you come up with.

          Show
          Edward Capriolo added a comment - @Min, I think you are on the right track. I think you might have your terminology mixed up. In AAA The first A is authentication which usually implies supply a user/password. second A authorize means what privileges the user has third A is accounting ( we already have that) The interfaces you supplied above looks like an Authorizer.... not Authenticator. I think public interface Authorizer { public boolean authorize(Privilege priv); public boolean authorize(Privilege priv, Table table); public boolean authorize(Privilege priv, List<Table> table); } But you seem to be on a role. I will hang back and wait to see what you come up with.
          Hide
          Min Zhou added a comment -

          oops, my code wasn't in my machine. I just pasted yours and modified it into mine.
          here is a patch show my code on that.

          Show
          Min Zhou added a comment - oops, my code wasn't in my machine. I just pasted yours and modified it into mine. here is a patch show my code on that.
          Hide
          Namit Jain added a comment -

          Looking at Min's patch createuser-v1.patch,

          I dont think we need create user/drop user etc. at all.

          As Edward mentioned before,
          When HWI starts the session on behalf of the user it runs "SET hadoop.ugi=

          {what user entered in the test box}

          " at that point if the user initiates a hive job, the output of that job should be files owned by that user. I am pretty sure the code in QL just chown's the files at job end or perhaps the entire job runs as that user (I cant remember).

          the user is always available from the environment and for now, let us assume that all authorizations happen to that user.

          Show
          Namit Jain added a comment - Looking at Min's patch createuser-v1.patch, I dont think we need create user/drop user etc. at all. As Edward mentioned before, When HWI starts the session on behalf of the user it runs "SET hadoop.ugi= {what user entered in the test box} " at that point if the user initiates a hive job, the output of that job should be files owned by that user. I am pretty sure the code in QL just chown's the files at job end or perhaps the entire job runs as that user (I cant remember). the user is always available from the environment and for now, let us assume that all authorizations happen to that user.
          Hide
          Min Zhou added a comment -

          @Namit

          Got your meaning. We are maintaining a version of our own, it needs couples of weeks for adapting to the trunk.

          Show
          Min Zhou added a comment - @Namit Got your meaning. We are maintaining a version of our own, it needs couples of weeks for adapting to the trunk.
          Hide
          Royce Rollins added a comment -

          I'm very interested in working on this issue this week but don't want to tread on anyone's work. What's the status?
          is anything checked in yet. I'd like to get this done as soon as possible.

          Show
          Royce Rollins added a comment - I'm very interested in working on this issue this week but don't want to tread on anyone's work. What's the status? is anything checked in yet. I'd like to get this done as soon as possible.
          Hide
          Amr Awadallah added a comment -

          I am also very curious what is latest on this jira, no updates since Sept of last year. Min, did you stop working on this?

          – amr

          Show
          Amr Awadallah added a comment - I am also very curious what is latest on this jira, no updates since Sept of last year. Min, did you stop working on this? – amr
          Hide
          Namit Jain added a comment -

          Is anyone working on this ?

          Show
          Namit Jain added a comment - Is anyone working on this ?
          Hide
          Carl Steinbach added a comment -

          Authorization proposal on the wiki: http://wiki.apache.org/hadoop/Hive/AuthDev

          Show
          Carl Steinbach added a comment - Authorization proposal on the wiki: http://wiki.apache.org/hadoop/Hive/AuthDev
          Hide
          Namit Jain added a comment -

          Please comment - we would like to hear all use cases before finalizing the design.

          Show
          Namit Jain added a comment - Please comment - we would like to hear all use cases before finalizing the design.
          Hide
          dhruba borthakur added a comment -

          Can somebody pl comment on how this ties in with HDFS permission/authorization? There is a small subsection in the doc about this issue, but I am unable to understand that part.

          Show
          dhruba borthakur added a comment - Can somebody pl comment on how this ties in with HDFS permission/authorization? There is a small subsection in the doc about this issue, but I am unable to understand that part.
          Hide
          He Yongqiang added a comment -

          @dhruba
          HDFS has its own authorization. So if we allow an access in Hive layer and pass this access to HDFS (by setting the correct hdfs username and groups), the job can fail with HDFS permission problem.
          So need to solve the problem from 2 layer independent authorization.
          One way to allow all accesses to HDFS, and let hive do the authorization. So hive runs as root in terms of HDFS.
          The other way is to plug in HDFS authorization to Hive layer, and only accept one access if both of Hive and HDFS say YES. A user belongs to different unix groups, and set hdfs permission based on the unix group. [ I am not sure about how many groups a user can have in terms of HDFS. I mean how many group settings you can put to a hdfs file. Let's simply say i want these 2 groups to be able to read the file.] The another problem is the column level privileges.
          This is very open for discussion, please comment on it.

          About the proposal, there is one authorization rule that we are not sure about. It's the simple rule: one deny then deny.

          Let's say this example:
          5.3.1 I want to grant everyone (new people may join at anytime) to db_name.*, and then later i want to protect one table db_name.T from ALL users but a few
          1) Add all users to a group 'users'. (assumption: new users will automatically join this group). And grant 'users' ALL privileges to db_name.*
          2) Add those few users to a new group 'users2'. AND REMOVE them from 'users'
          3) DENY 'users' to db_name.T
          4) Grant ALL on db_name.T to users2

          The main problem in this approach is that "REMOVE them from 'users'" is not practicable.

          The other options that we have thought about is another rule.

          First try user name:

          first try to deny this access by look up the deny tables by user name:

          1. If there is an entry in 'user' that deny this access, return DENY
          2. If there is an entry in 'db' that deny this access, return DENY
          3. If there is an entry in 'table' that deny this access, return DENY
          4. If there is an entry in 'column' that deny this access, return DENY

          If we got one deny, will return deny for this attempt.

          if deny failed, go through all privilege levels with the user name:

          5. If there is an entry in 'user' that accept this access, return ACCEPT
          6. If there is an entry in 'db' that accept this access, return ACCEPT
          7. If there is an entry in 'table' that accept this access, return ACCEPT
          8. If there is an entry in 'column' that accept this access, return ACCEPT

          Second try the user's group/role names one by one until we get an ACCEPT. If we get an ACCEPT from one group/role, will ACCEPT this access. Else deny.

          For each role/group, we do the same routine as we did for user name.
          The problem with this approach is it's a little bit complex and we did not find any system that use this. For mysql, there is no deny. For sql server, it's one deny then deny.

          Show
          He Yongqiang added a comment - @dhruba HDFS has its own authorization. So if we allow an access in Hive layer and pass this access to HDFS (by setting the correct hdfs username and groups), the job can fail with HDFS permission problem. So need to solve the problem from 2 layer independent authorization. One way to allow all accesses to HDFS, and let hive do the authorization. So hive runs as root in terms of HDFS. The other way is to plug in HDFS authorization to Hive layer, and only accept one access if both of Hive and HDFS say YES. A user belongs to different unix groups, and set hdfs permission based on the unix group. [ I am not sure about how many groups a user can have in terms of HDFS. I mean how many group settings you can put to a hdfs file. Let's simply say i want these 2 groups to be able to read the file.] The another problem is the column level privileges. This is very open for discussion, please comment on it. About the proposal, there is one authorization rule that we are not sure about. It's the simple rule: one deny then deny. Let's say this example: 5.3.1 I want to grant everyone (new people may join at anytime) to db_name.*, and then later i want to protect one table db_name.T from ALL users but a few 1) Add all users to a group 'users'. (assumption: new users will automatically join this group). And grant 'users' ALL privileges to db_name.* 2) Add those few users to a new group 'users2'. AND REMOVE them from 'users' 3) DENY 'users' to db_name.T 4) Grant ALL on db_name.T to users2 The main problem in this approach is that "REMOVE them from 'users'" is not practicable. The other options that we have thought about is another rule. First try user name: first try to deny this access by look up the deny tables by user name: 1. If there is an entry in 'user' that deny this access, return DENY 2. If there is an entry in 'db' that deny this access, return DENY 3. If there is an entry in 'table' that deny this access, return DENY 4. If there is an entry in 'column' that deny this access, return DENY If we got one deny, will return deny for this attempt. if deny failed, go through all privilege levels with the user name: 5. If there is an entry in 'user' that accept this access, return ACCEPT 6. If there is an entry in 'db' that accept this access, return ACCEPT 7. If there is an entry in 'table' that accept this access, return ACCEPT 8. If there is an entry in 'column' that accept this access, return ACCEPT Second try the user's group/role names one by one until we get an ACCEPT. If we get an ACCEPT from one group/role, will ACCEPT this access. Else deny. For each role/group, we do the same routine as we did for user name. The problem with this approach is it's a little bit complex and we did not find any system that use this. For mysql, there is no deny. For sql server, it's one deny then deny.
          Hide
          He Yongqiang added a comment -

          The other option we came up from offline discussion is the rule of "one accept then accept" but in a hierarchy style. First check privileges granted the user and groups. One accept then accept; One deny then deny. And then check role level privileges, one accept then accept; one deny then deny.

          We prefer to go with this rule. Please comment, and if no concerns on this, i will update the wiki.

          Show
          He Yongqiang added a comment - The other option we came up from offline discussion is the rule of "one accept then accept" but in a hierarchy style. First check privileges granted the user and groups. One accept then accept; One deny then deny. And then check role level privileges, one accept then accept; one deny then deny. We prefer to go with this rule. Please comment, and if no concerns on this, i will update the wiki.
          Hide
          He Yongqiang added a comment -

          Sorry, in the previous comment: by "one accept then accept; one deny then deny", i mean "Accept overwrite deny. one accept then accept; no accept then deny"

          Show
          He Yongqiang added a comment - Sorry, in the previous comment: by "one accept then accept; one deny then deny", i mean "Accept overwrite deny. one accept then accept; no accept then deny"
          Hide
          Todd Lipcon added a comment -

          I'm a little unclear on how the user identity is passed down to the MR layer. Carl and I had chatted about this a few weeks back – is the idea now that all hive queries will run MR jobs as a "hive" user, rather than "todd"? If so, we need to add authorization control for UDFs and TRANSFORM as well, since a user could trivially take over the "hive" user credentials from within a UDF. If the MR jobs will continue to run as "todd", then I don't understand how we can apply any permissions model that is any different than HDFS permissions. More restrictive is impossible because I can just read the files myself, and less restrictive is impossible because HDFS is applying permissions based on the "todd" identity.

          Show
          Todd Lipcon added a comment - I'm a little unclear on how the user identity is passed down to the MR layer. Carl and I had chatted about this a few weeks back – is the idea now that all hive queries will run MR jobs as a "hive" user, rather than "todd"? If so, we need to add authorization control for UDFs and TRANSFORM as well, since a user could trivially take over the "hive" user credentials from within a UDF. If the MR jobs will continue to run as "todd", then I don't understand how we can apply any permissions model that is any different than HDFS permissions. More restrictive is impossible because I can just read the files myself, and less restrictive is impossible because HDFS is applying permissions based on the "todd" identity.
          Hide
          Carl Steinbach added a comment -

          The issue that Todd raised is pretty important and needs to be addressed in the proposal.
          My personal opinion is that running all queries as a "hive" super-user is the most
          practical approach and will also yield behavior that is familiar to users of traditional
          RDBMS systems (who I expect will increasingly define the average Hive user/administrator).

          There are some other follow-on issues that need to be decided if we end up settling
          on this approach:

          • This approach to authorization presupposes that users are accessing Hive through a HiveServer process. This follows from the fact that A) you want Hive to execute the query plans as the Hive superuser, and B) that user can circumvent the authorization model if they are given direct access to the MetaStore DB. It would be nice if the proposal explicitly stated this requirement and mentioned some of the follow-on work that this necessitates, e.g. fixing concurrency issues in HiveServer, reducing the memory requirements of HiveServer, etc.
          • We need to apply the authorization model to the 'add [archive|file|jar]' commands as well as add temorary function. add jar and add file both currently allow the user to inject code into MR jobs, and add jar in conjunction with add temporary function allows the user to inject and execute arbitrary code within the HiveServer process. We may also want to add a new add executable command for adding executable scripts that has a different permission model than add file.
          • I think there also may be security issues stemming from external tables, e.g. if I create an external table that points to another user's home directory and then run a query on it which executes with Hive's superuser permissions.
          • Loading date into the Hive warehouse from an arbitrary HDFS location and exporting data to other locations in HDFS are two issues that need to be considered. In each case I think the correct behavior depends on both the Hive process's permissions and those of the user.
          Show
          Carl Steinbach added a comment - The issue that Todd raised is pretty important and needs to be addressed in the proposal. My personal opinion is that running all queries as a "hive" super-user is the most practical approach and will also yield behavior that is familiar to users of traditional RDBMS systems (who I expect will increasingly define the average Hive user/administrator). There are some other follow-on issues that need to be decided if we end up settling on this approach: This approach to authorization presupposes that users are accessing Hive through a HiveServer process. This follows from the fact that A) you want Hive to execute the query plans as the Hive superuser, and B) that user can circumvent the authorization model if they are given direct access to the MetaStore DB. It would be nice if the proposal explicitly stated this requirement and mentioned some of the follow-on work that this necessitates, e.g. fixing concurrency issues in HiveServer, reducing the memory requirements of HiveServer, etc. We need to apply the authorization model to the ' add [archive|file|jar] ' commands as well as add temorary function . add jar and add file both currently allow the user to inject code into MR jobs, and add jar in conjunction with add temporary function allows the user to inject and execute arbitrary code within the HiveServer process. We may also want to add a new add executable command for adding executable scripts that has a different permission model than add file . I think there also may be security issues stemming from external tables, e.g. if I create an external table that points to another user's home directory and then run a query on it which executes with Hive's superuser permissions. Loading date into the Hive warehouse from an arbitrary HDFS location and exporting data to other locations in HDFS are two issues that need to be considered. In each case I think the correct behavior depends on both the Hive process's permissions and those of the user.
          Hide
          He Yongqiang added a comment -

          By-passing the hdfs permission from hive layer is just one option. And the implementation should also support setting user groups in the hdfs side. And let the mapreduce job run as the user.

          Just a quick update about the authorization rule:

          In the offline discussion we had internally this afternoon, remove DENY should also another option to be considered. And we examined our use cased with this (without DENY), it works. So remove DENY from the authorization will simplify the implementation a lot.

          And regarding view and index, for the first version, we should not do that. And we can do them later when we have a better understanding after we implement the first version.

          Show
          He Yongqiang added a comment - By-passing the hdfs permission from hive layer is just one option. And the implementation should also support setting user groups in the hdfs side. And let the mapreduce job run as the user. Just a quick update about the authorization rule: In the offline discussion we had internally this afternoon, remove DENY should also another option to be considered. And we examined our use cased with this (without DENY), it works. So remove DENY from the authorization will simplify the implementation a lot. And regarding view and index, for the first version, we should not do that. And we can do them later when we have a better understanding after we implement the first version.
          Hide
          Namit Jain added a comment -

          Overall, there are many security holes in the system. and we are not proposing to close all of them.

          To start with, it is an attempt for good users, it is not meant for the malicious users -
          the idea is to prevent good users from committing a mistake.

          Show
          Namit Jain added a comment - Overall, there are many security holes in the system. and we are not proposing to close all of them. To start with, it is an attempt for good users, it is not meant for the malicious users - the idea is to prevent good users from committing a mistake.
          Hide
          John Sichi added a comment -

          (implementation note)

          If we really need multiple metastore tables, let's name them consistently:

          user_priv
          db_priv
          tbl_priv
          col_priv

          Show
          John Sichi added a comment - (implementation note) If we really need multiple metastore tables, let's name them consistently: user_priv db_priv tbl_priv col_priv
          Hide
          Carl Steinbach added a comment -

          @Namit: I think it's fine to take an incremental approach with this, but then it's important
          to spell out what the known security holes are so users and administrators
          know what they're getting. Otherwise we're going to spend a lot of time answering
          questions on the hive-user list.

          Show
          Carl Steinbach added a comment - @Namit: I think it's fine to take an incremental approach with this, but then it's important to spell out what the known security holes are so users and administrators know what they're getting. Otherwise we're going to spend a lot of time answering questions on the hive-user list.
          Hide
          He Yongqiang added a comment -

          Attach two patches. One is including thrift generated code in case anyone wants to try it.
          The other is just java code changes for a clean review.

          These two patches only contains DDL and metadata changes. There are no integration code with query execution part. will do that in the following patch.

          Some examples:

          > show grant user `test` on table `src`;
          OK
          Time taken: 0.081 seconds

          hive> grant `select` on table src to user test, group grp;
          OK
          Time taken: 0.118 seconds

          hive> show grant user `test` on table `src`;
          OK
          dbName:default
          tableName:src
          userName:test
          isRole:false
          isGroup:false
          privileges:Select
          grantTime:1288850969
          grantor:
          grantor:
          Time taken: 0.09 seconds

          hive> show grant group `grp` on table `src`;
          OK
          dbName:default
          tableName:src
          userName:grp
          isRole:false
          isGroup:true
          privileges:Select
          grantTime:1288850969
          grantor:
          grantor:
          Time taken: 0.08 seconds

          hive> revoke `select` on table src from user test;
          OK
          Time taken: 0.041 seconds

          hive> show grant user `test` on table `src`;
          OK
          Time taken: 0.078 seconds

          hive> show grant group `grp` on table `src`;
          OK
          dbName:default
          tableName:src
          userName:grp
          isRole:false
          isGroup:true
          privileges:Select
          grantTime:1288850969
          grantor:
          grantor:
          Time taken: 0.079 seconds

          >grant `select`(key, value) on table src to user test;
          OK
          Time taken: 0.174 seconds

          > show grant user `test` on table `src`(key);
          OK
          dbName:default
          tableName:src
          columnName:key
          userName:test
          isRole:false
          isGroup:false
          privileges:Select
          grantTime:1288851160
          grantor:
          grantor:
          Time taken: 6.722 seconds

          hive> show grant user `test` on table `src`(key, value);
          OK
          dbName:default
          tableName:src
          columnName:key
          userName:test
          isRole:false
          isGroup:false
          privileges:Select
          grantTime:1288851160
          grantor:
          dbName:default
          tableName:src
          columnName:value
          userName:test
          isRole:false
          isGroup:false
          privileges:Select
          grantTime:1288851160
          grantor:
          grantor:

          Show
          He Yongqiang added a comment - Attach two patches. One is including thrift generated code in case anyone wants to try it. The other is just java code changes for a clean review. These two patches only contains DDL and metadata changes. There are no integration code with query execution part. will do that in the following patch. Some examples: > show grant user `test` on table `src`; OK Time taken: 0.081 seconds hive> grant `select` on table src to user test, group grp; OK Time taken: 0.118 seconds hive> show grant user `test` on table `src`; OK dbName:default tableName:src userName:test isRole:false isGroup:false privileges:Select grantTime:1288850969 grantor: grantor: Time taken: 0.09 seconds hive> show grant group `grp` on table `src`; OK dbName:default tableName:src userName:grp isRole:false isGroup:true privileges:Select grantTime:1288850969 grantor: grantor: Time taken: 0.08 seconds hive> revoke `select` on table src from user test; OK Time taken: 0.041 seconds hive> show grant user `test` on table `src`; OK Time taken: 0.078 seconds hive> show grant group `grp` on table `src`; OK dbName:default tableName:src userName:grp isRole:false isGroup:true privileges:Select grantTime:1288850969 grantor: grantor: Time taken: 0.079 seconds >grant `select`(key, value) on table src to user test; OK Time taken: 0.174 seconds > show grant user `test` on table `src`(key); OK dbName:default tableName:src columnName:key userName:test isRole:false isGroup:false privileges:Select grantTime:1288851160 grantor: grantor: Time taken: 6.722 seconds hive> show grant user `test` on table `src`(key, value); OK dbName:default tableName:src columnName:key userName:test isRole:false isGroup:false privileges:Select grantTime:1288851160 grantor: dbName:default tableName:src columnName:value userName:test isRole:false isGroup:false privileges:Select grantTime:1288851160 grantor: grantor:
          Show
          John Sichi added a comment - https://reviews.apache.org/r/55/diff/#index_header
          Hide
          John Sichi added a comment -

          It looks like HIVE-78.1.nothrift.patch still has a bunch of thrift-generate files in it (metastore/src/gen-javabean/org/apache/hadoop/hive/metastore/api/*)

          Show
          John Sichi added a comment - It looks like HIVE-78 .1.nothrift.patch still has a bunch of thrift-generate files in it (metastore/src/gen-javabean/org/apache/hadoop/hive/metastore/api/*)
          Hide
          Pradeep Kamath added a comment -

          Will there be a way to turn off authorization (through some configuration property) OR is there a way to allow all access OR is authorization implementation going to be pluggable? Since howl is looking at a different authorization model based on dfs permissions, one of these options would be needed for howl.

          Show
          Pradeep Kamath added a comment - Will there be a way to turn off authorization (through some configuration property) OR is there a way to allow all access OR is authorization implementation going to be pluggable? Since howl is looking at a different authorization model based on dfs permissions, one of these options would be needed for howl.
          Hide
          He Yongqiang added a comment -

          >>Will there be a way to turn off authorization (through some configuration property)
          Yes.
          >>is authorization implementation going to be pluggable?
          Yes. This is exactly what we wanted.

          I think Howl can just plug in its own authorization implementation.

          Show
          He Yongqiang added a comment - >>Will there be a way to turn off authorization (through some configuration property) Yes. >>is authorization implementation going to be pluggable? Yes. This is exactly what we wanted. I think Howl can just plug in its own authorization implementation.
          Hide
          He Yongqiang added a comment -

          Attached 2 new draft patches.

          There maybe some bugs since i only did a few simple tests. But i think they are ready for early review.

          HIVE-78.2.nothrift.patch does not include the thrift changes.
          HIVE-78.2.thrift.patch is a complete patch.

          Show
          He Yongqiang added a comment - Attached 2 new draft patches. There maybe some bugs since i only did a few simple tests. But i think they are ready for early review. HIVE-78 .2.nothrift.patch does not include the thrift changes. HIVE-78 .2.thrift.patch is a complete patch.
          Hide
          Namit Jain added a comment -

          Can you add the tests in the non-thrift patch ? It becomes easier to review

          Show
          Namit Jain added a comment - Can you add the tests in the non-thrift patch ? It becomes easier to review
          Hide
          Namit Jain added a comment -

          Also, can you refresh and re-apply the patch ? It does not apply cleanly and is therefore not possible to actually compile/test and understand.

          Show
          Namit Jain added a comment - Also, can you refresh and re-apply the patch ? It does not apply cleanly and is therefore not possible to actually compile/test and understand.
          Hide
          Namit Jain added a comment -

          Few minor comments:

          1. Can you add more comments in M* files (the new files in the metastore) ?
          2. MRoleEntiry needs a database name - so does the thirft file ?
          3. Can you verify that create and create table as select works for hive replication ?
          4. Can you check who adds inputs/outputs for locking operations ?

          Show
          Namit Jain added a comment - Few minor comments: 1. Can you add more comments in M* files (the new files in the metastore) ? 2. MRoleEntiry needs a database name - so does the thirft file ? 3. Can you verify that create and create table as select works for hive replication ? 4. Can you check who adds inputs/outputs for locking operations ?
          Hide
          Namit Jain added a comment -

          Driver:
          //do the authorization check
          385 if (HiveConf.getBoolVar(conf,
          386 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED)) {
          387 boolean pass = doAuthorization(sem);
          388 if (!pass)

          { 389 console.printError("Authrizatio\ n failed (not enough privileges found t? o run the query.)."); 390 return (400); 391 }

          392 }

          Can we print the reason which privilege was missing ?

          Can we optimize the scenario - we are checking for all partitions one-by-one
          both for inputs and outputs ? What if the user/group/role has the table
          privilege - we dont need to go over all the partitions one by one.
          We can even do this in a follow-up

          Why do we need the change in QueryPlan ?

          showGrants: should the output have a schema ? Going forwad, it will
          be easier for JDBC clients to parse.

          No need to change WriteEntity etc. ?

          user cannot be made a reserved word - ~20 tables have a column called 'user'
          in facebook - please check 'role' and 'option'.

          SemanticAnalyzer: 3511 not needed

          What happens to replication of roles - needs to be done

          Where are the privileges copied for a newly created partition ?

          Show
          Namit Jain added a comment - Driver: //do the authorization check 385 if (HiveConf.getBoolVar(conf, 386 HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED)) { 387 boolean pass = doAuthorization(sem); 388 if (!pass) { 389 console.printError("Authrizatio\ n failed (not enough privileges found t? o run the query.)."); 390 return (400); 391 } 392 } Can we print the reason which privilege was missing ? Can we optimize the scenario - we are checking for all partitions one-by-one both for inputs and outputs ? What if the user/group/role has the table privilege - we dont need to go over all the partitions one by one. We can even do this in a follow-up Why do we need the change in QueryPlan ? showGrants: should the output have a schema ? Going forwad, it will be easier for JDBC clients to parse. No need to change WriteEntity etc. ? user cannot be made a reserved word - ~20 tables have a column called 'user' in facebook - please check 'role' and 'option'. SemanticAnalyzer: 3511 not needed What happens to replication of roles - needs to be done Where are the privileges copied for a newly created partition ?
          Hide
          Namit Jain added a comment -

          In case of dynamic partitions, you can also have DummyPartition outputs.
          They will contain the correct Table definition.
          Are you taking care of them ?

          Show
          Namit Jain added a comment - In case of dynamic partitions, you can also have DummyPartition outputs. They will contain the correct Table definition. Are you taking care of them ?
          Hide
          He Yongqiang added a comment -

          >>Can you check who adds inputs/outputs for locking operations ?

          It seems no inputs and outputs for lock/unlock.

          Show
          He Yongqiang added a comment - >>Can you check who adds inputs/outputs for locking operations ? It seems no inputs and outputs for lock/unlock.
          Hide
          He Yongqiang added a comment -

          Attached 2 new patched.

          HIVE-78.4.complete.patch is a complete patch.
          And HIVE-78.4.no_thrift.patch does not contain thrift generated codes.

          Show
          He Yongqiang added a comment - Attached 2 new patched. HIVE-78 .4.complete.patch is a complete patch. And HIVE-78 .4.no_thrift.patch does not contain thrift generated codes.
          Hide
          He Yongqiang added a comment -

          A new patch.

          Had an internal group code review, the main changes are:
          1) instead of calling metastore again to get partition's privilege information, pack user's privileges in Partition object when getting partition.
          2) added a few configs for grant behavior on new tables.

          <property>
          <name>hive.exec.security.authorization.table.owner.grants</name>
          <value></value>
          <description>the privileges automatically granted to the owner</description>
          </property>

          <property>
          <name>hive.exec.security.authorization.table.user.grants</name>
          <value></value>
          <description>the privileges automatically granted to some users whenenve a table gets created.
          An example like "userX,userY:select;userZ:create" will grant select privilege to userX and userY,
          and grant create privilege to userZ whenenve a new table created.</description>
          </property>

          <property>
          <name>hive.exec.security.authorization.table.group.grants</name>
          <value></value>
          <description>the privileges automatically granted to some groups whenenve a table gets created.
          An example like "groupX,groupY:select;groupZ:create" will grant select privilege to groupX and groupY,
          and grant create privilege to groupZ whenenve a new table created.</description>
          </property>

          <property>
          <name>hive.exec.security.authorization.table.role.grants</name>
          <value></value>
          <description>the privileges automatically granted to some groups whenenve a table gets created.
          An example like "roleX,roleY:select;roleZ:create" will grant select privilege to roleX and roleY,
          and grant create privilege to roleZ whenenve a new table created.</description>
          </property>

          3) changed privilege 'Overwrite' to 'update'

          Show
          He Yongqiang added a comment - A new patch. Had an internal group code review, the main changes are: 1) instead of calling metastore again to get partition's privilege information, pack user's privileges in Partition object when getting partition. 2) added a few configs for grant behavior on new tables. <property> <name>hive.exec.security.authorization.table.owner.grants</name> <value></value> <description>the privileges automatically granted to the owner</description> </property> <property> <name>hive.exec.security.authorization.table.user.grants</name> <value></value> <description>the privileges automatically granted to some users whenenve a table gets created. An example like "userX,userY:select;userZ:create" will grant select privilege to userX and userY, and grant create privilege to userZ whenenve a new table created.</description> </property> <property> <name>hive.exec.security.authorization.table.group.grants</name> <value></value> <description>the privileges automatically granted to some groups whenenve a table gets created. An example like "groupX,groupY:select;groupZ:create" will grant select privilege to groupX and groupY, and grant create privilege to groupZ whenenve a new table created.</description> </property> <property> <name>hive.exec.security.authorization.table.role.grants</name> <value></value> <description>the privileges automatically granted to some groups whenenve a table gets created. An example like "roleX,roleY:select;roleZ:create" will grant select privilege to roleX and roleY, and grant create privilege to roleZ whenenve a new table created.</description> </property> 3) changed privilege 'Overwrite' to 'update'
          Hide
          John Sichi added a comment -

          Taking a first look at this one; I will have a number of suggestions on naming/structure for thrift and JDO. I think you accidentally omitted the org.apache.hadoop.hive.ql.security.authorization package since I see references to it but no code.

          Show
          John Sichi added a comment - Taking a first look at this one; I will have a number of suggestions on naming/structure for thrift and JDO. I think you accidentally omitted the org.apache.hadoop.hive.ql.security.authorization package since I see references to it but no code.
          Hide
          He Yongqiang added a comment -

          You can find it from the complete patch.

          will rebase the patch against the new thrift.

          Show
          He Yongqiang added a comment - You can find it from the complete patch. will rebase the patch against the new thrift.
          Hide
          He Yongqiang added a comment -

          refresh patch against the trunk.

          Show
          He Yongqiang added a comment - refresh patch against the trunk.
          Hide
          John Sichi added a comment -

          First batch of review comments.

          JDO:

          • Do we want roles to be contained by databases? Let's discuss this
            at next design review.
          • Instead of two separate flags (IS_ROLE/IS_GROUP) should we instead use
            an enum for principal type { USER, GROUP, ROLE }

            ?

          • Naming suggestions (if accepted, propagate to Thrift API also):
            • SECURITYROLE -> ROLES
            • SECURITYROLEMAP -> ROLE_MAP
            • SECURITYUSER -> GLOBAL_PRIVS
            • SECURITYDB -> DB_PRIVS
            • SECURITYTBLPART -> TBLPART_PRIVS
            • SECURITYCOLUMN -> COL_PRIVS
          • VARCHAR precision for "privileges" fields should be 4000
          • Since we're going to need to record GRANT OPTION eventually, maybe
            we should add it now so that we don't have to ALTER TABLE later?

          Thrift API:

          • Avoid embedding objects inside of other objects except where
            necessary. For example, in the definition of struct Role, use
            dbName instead of a Database object (assuming we keep roles as
            contained by databases). Likewise, in PrivilegeBag, the map keys
            should be identifiers, not objects. This applies to quite a few of
            the new structs.
          • Can we reduce the number of new structs and API calls by
            consolidating different object types? For example, for the
            get_XXX_privilege_set calls, just have one, and take object
            type+identifier.
          • Add comments for all new methods.

          Config:

          • Why is hive.exec.security used for some config params instead of
            hive.security? Also, those parameter names should make it clear
            that they are default grants. Also, do we really need owner grants
            (don't owners automatically have full privileges implicitly)?
          • Looks like hive.variable.substitute crept in from some other patch.
          • Comments for plugin-loading parameters should make it explicit
            exactly which interface they are supposed to implement.
          • Comment for role grants says "to some groups" instead.

          Pluggable Interfaces:

          • I don't think we need the factory classes; just add new methods to
            HiveUtils (and follow the classloading pattern used there)
          • Rename AuthorizationProvider to HiveAuthorizationProvider
            and make it extend Configurable
          • Rename AuthorizationProviderManager to AbstractAuthorizationProvider
          • All outside references should be to the interface (HiveAuthorizationProvider)
            not the abstract class.
          • Rename Authenticator to HiveAuthenticationProvider and make it
            extend Configurable
          • Javadoc?

          Typos:

          • principla
          • Authrization
          • GrantInfor
          • privielges
          • "Table is partitioned, but partition spec found"
          • DummpyAuthenticator
          • detroy
          • wheenve

          Implementation:

          • why does doAuthorization return a boolean when it just throws
            anyway?
          • more coming...
          Show
          John Sichi added a comment - First batch of review comments. JDO: Do we want roles to be contained by databases? Let's discuss this at next design review. Instead of two separate flags (IS_ROLE/IS_GROUP) should we instead use an enum for principal type { USER, GROUP, ROLE } ? Naming suggestions (if accepted, propagate to Thrift API also): SECURITYROLE -> ROLES SECURITYROLEMAP -> ROLE_MAP SECURITYUSER -> GLOBAL_PRIVS SECURITYDB -> DB_PRIVS SECURITYTBLPART -> TBLPART_PRIVS SECURITYCOLUMN -> COL_PRIVS VARCHAR precision for "privileges" fields should be 4000 Since we're going to need to record GRANT OPTION eventually, maybe we should add it now so that we don't have to ALTER TABLE later? Thrift API: Avoid embedding objects inside of other objects except where necessary. For example, in the definition of struct Role, use dbName instead of a Database object (assuming we keep roles as contained by databases). Likewise, in PrivilegeBag, the map keys should be identifiers, not objects. This applies to quite a few of the new structs. Can we reduce the number of new structs and API calls by consolidating different object types? For example, for the get_XXX_privilege_set calls, just have one, and take object type+identifier. Add comments for all new methods. Config: Why is hive.exec.security used for some config params instead of hive.security? Also, those parameter names should make it clear that they are default grants. Also, do we really need owner grants (don't owners automatically have full privileges implicitly)? Looks like hive.variable.substitute crept in from some other patch. Comments for plugin-loading parameters should make it explicit exactly which interface they are supposed to implement. Comment for role grants says "to some groups" instead. Pluggable Interfaces: I don't think we need the factory classes; just add new methods to HiveUtils (and follow the classloading pattern used there) Rename AuthorizationProvider to HiveAuthorizationProvider and make it extend Configurable Rename AuthorizationProviderManager to AbstractAuthorizationProvider All outside references should be to the interface (HiveAuthorizationProvider) not the abstract class. Rename Authenticator to HiveAuthenticationProvider and make it extend Configurable Javadoc? Typos: principla Authrization GrantInfor privielges "Table is partitioned, but partition spec found" DummpyAuthenticator detroy wheenve Implementation: why does doAuthorization return a boolean when it just throws anyway? more coming...
          Hide
          John Sichi added a comment -

          Some more from me:

          • There's a bug when attempting to grant multiple privileges at once;
            only one of them is getting granted (what I showed you in CLI)
          • Multiple grants from the same grantor to the same grantee should not
            result in duplicates (verify against Oracle), and we should collapse
            everything into one row no matter whether the grants were made at
            the same or different times (sort privilege names for determinism)
          • revokeAllPrivileges should revoke role grants as well
          • Role cycle is not being prevented
          • try/finally around transactions in ObjectStore should be used
            consistently (I know there are some cases which were already missing
            them, but we shouldn't make it worse)
          • Don't use printStackTrace
          • show [role] grant role unknown should fail (even though we have to
            tolerate unknown for user/group since we don't have a table for those)

          Some additional points noted at code review session:

          • Need many many negative tests
          • Provide a way to make partitions inherit from table (and make it the
            default)
          • Define a UNIQUE key for the priv tables in JDO
          • GRANT should mark WriteEntity for replication etc

          More Typos:

          • candicate
          • anaylze

          I have some more code-level comments but not all of them may be relevant after
          the issues above have been resolved, so I'll do another pass after the
          next patch.

          Show
          John Sichi added a comment - Some more from me: There's a bug when attempting to grant multiple privileges at once; only one of them is getting granted (what I showed you in CLI) Multiple grants from the same grantor to the same grantee should not result in duplicates (verify against Oracle), and we should collapse everything into one row no matter whether the grants were made at the same or different times (sort privilege names for determinism) revokeAllPrivileges should revoke role grants as well Role cycle is not being prevented try/finally around transactions in ObjectStore should be used consistently (I know there are some cases which were already missing them, but we shouldn't make it worse) Don't use printStackTrace show [role] grant role unknown should fail (even though we have to tolerate unknown for user/group since we don't have a table for those) Some additional points noted at code review session: Need many many negative tests Provide a way to make partitions inherit from table (and make it the default) Define a UNIQUE key for the priv tables in JDO GRANT should mark WriteEntity for replication etc More Typos: candicate anaylze I have some more code-level comments but not all of them may be relevant after the issues above have been resolved, so I'll do another pass after the next patch.
          Hide
          He Yongqiang added a comment -

          A new patch addressed some of the comments.

          I can not resolve all the comments in this big patch. Please put the left to follow-up jiras.

          Show
          He Yongqiang added a comment - A new patch addressed some of the comments. I can not resolve all the comments in this big patch. Please put the left to follow-up jiras.
          Hide
          He Yongqiang added a comment -

          Needs to open follow-up jiras for:
          1. Avoid embedding objects inside of other objects except where necessary.
          2. revokeAllPrivileges should revoke role grants as well
          3. Role cycle is not being prevented
          4. try/finally around transactions in ObjectStore should be used consistently
          5. more negative tests
          6. GRANT should mark WriteEntity for replication etc
          7. Provide a way to make partitions inherit from table (and make it the default)
          8: Multiple grants from the same grantor to the same grantee should not result in duplicates

          Show
          He Yongqiang added a comment - Needs to open follow-up jiras for: 1. Avoid embedding objects inside of other objects except where necessary. 2. revokeAllPrivileges should revoke role grants as well 3. Role cycle is not being prevented 4. try/finally around transactions in ObjectStore should be used consistently 5. more negative tests 6. GRANT should mark WriteEntity for replication etc 7. Provide a way to make partitions inherit from table (and make it the default) 8: Multiple grants from the same grantor to the same grantee should not result in duplicates
          Hide
          He Yongqiang added a comment -

          Let's get this in asap and do follow-ups. It is really painful to maintain it.
          And there are not a few big changes from the first patch. Just need to update it few weeks later after the previous patches.

          Show
          He Yongqiang added a comment - Let's get this in asap and do follow-ups. It is really painful to maintain it. And there are not a few big changes from the first patch. Just need to update it few weeks later after the previous patches.
          Hide
          John Sichi added a comment -

          We can't take the size of a patch as a justification for checking in code which doesn't pass review, especially for things like JDO and Thrift API's which are going to be there forever. I discussed it with Namit and his suggestion was to break it down into smaller patches to be committed in sequence so that we can divide-and-conquer the review process. For future projects, it would be great if we can do the same for the design process itself so that the coding doesn't get too far ahead of the design (which is how we end up with giant patches).

          The items below are OK for followups

          2. revokeAllPrivileges should revoke role grants as well
          3. Role cycle is not being prevented
          6. GRANT should mark WriteEntity for replication etc

          For this one, we should at least work out the metastore model as part of the JDO changes:

          7. Provide a way to make partitions inherit from table (and make it the default)

          The rest need to be addressed up front as part of the relevant patches.

          Separately, maybe using git for branch+merge would help make development of a feature of this size more manageable? (If you're not already.)

          Show
          John Sichi added a comment - We can't take the size of a patch as a justification for checking in code which doesn't pass review, especially for things like JDO and Thrift API's which are going to be there forever. I discussed it with Namit and his suggestion was to break it down into smaller patches to be committed in sequence so that we can divide-and-conquer the review process. For future projects, it would be great if we can do the same for the design process itself so that the coding doesn't get too far ahead of the design (which is how we end up with giant patches). The items below are OK for followups 2. revokeAllPrivileges should revoke role grants as well 3. Role cycle is not being prevented 6. GRANT should mark WriteEntity for replication etc For this one, we should at least work out the metastore model as part of the JDO changes: 7. Provide a way to make partitions inherit from table (and make it the default) The rest need to be addressed up front as part of the relevant patches. Separately, maybe using git for branch+merge would help make development of a feature of this size more manageable? (If you're not already.)
          Hide
          He Yongqiang added a comment -

          No. I do not think i need to make changes in short term for the JDO and thrift apis. If you want, do follow ups on them.

          7. Provide a way to make partitions inherit from table (and make it the default)
          This can be done in a follow-up jira.

          Show
          He Yongqiang added a comment - No. I do not think i need to make changes in short term for the JDO and thrift apis. If you want, do follow ups on them. 7. Provide a way to make partitions inherit from table (and make it the default) This can be done in a follow-up jira.
          Hide
          He Yongqiang added a comment -

          By "If you want, do follow ups on them." I meant "if you want, open follow up jiras and assign to me"

          Here are some points that why they are not easy to do:
          For JDO embedding,
          Mostly in the new Objects, there are Table object, Database object, Partition object.

          If we only keep name for them, It's ok for database. But for Table, need to user dbName, tableName, For partition need dbName, tableName, partName.
          And need to fetch the object on client side to see the object exist or not. And pass the names to meta-store, the metastore will do another lookup to find ids for db/tbl/part to put into new objects.

          For thrift apis, one benefit of consolidating into one is reducing the api numbers.

          Show
          He Yongqiang added a comment - By "If you want, do follow ups on them." I meant "if you want, open follow up jiras and assign to me" Here are some points that why they are not easy to do: For JDO embedding, Mostly in the new Objects, there are Table object, Database object, Partition object. If we only keep name for them, It's ok for database. But for Table, need to user dbName, tableName, For partition need dbName, tableName, partName. And need to fetch the object on client side to see the object exist or not. And pass the names to meta-store, the metastore will do another lookup to find ids for db/tbl/part to put into new objects. For thrift apis, one benefit of consolidating into one is reducing the api numbers.
          Hide
          John Sichi added a comment -

          Regarding pass-by-name vs pass-by-value for object references in the Thrift API, take a look at how drop table works. We already fetch the table descriptor in DDLTask (so that we can include its info in the posthook). But then, when we drop the table, we pass dbname+tblname (not the actual table object). So I don't see the need to invent a new pattern here.

          For dealing with compound names, it's fine to define a new struct ObjectReference with object type plus various optional components, then pass that. (In the future, we could also decide to hide an ID in there for the lookup-skipping optimization you mention if it turns out to be warranted.)

          Show
          John Sichi added a comment - Regarding pass-by-name vs pass-by-value for object references in the Thrift API, take a look at how drop table works. We already fetch the table descriptor in DDLTask (so that we can include its info in the posthook). But then, when we drop the table, we pass dbname+tblname (not the actual table object). So I don't see the need to invent a new pattern here. For dealing with compound names, it's fine to define a new struct ObjectReference with object type plus various optional components, then pass that. (In the future, we could also decide to hide an ID in there for the lookup-skipping optimization you mention if it turns out to be warranted.)
          Hide
          He Yongqiang added a comment -

          @John
          Regarding the thrift API's object embedding, do you mean define some new object in thrift like:
          strung TableRef

          { string dbname string tablename }

          and similar to Partition?

          That sounds good to me.

          Show
          He Yongqiang added a comment - @John Regarding the thrift API's object embedding, do you mean define some new object in thrift like: strung TableRef { string dbname string tablename } and similar to Partition? That sounds good to me.
          Hide
          Alan Gates added a comment -

          There's been quite a bit of discussion back and forth in this JIRA on who owns the files (Hive or the user) and who MR jobs execute as. The answers to these questions are very important, but I wasn't able to decipher from the JIRA how they were answered. Was one approach or another selected?

          Show
          Alan Gates added a comment - There's been quite a bit of discussion back and forth in this JIRA on who owns the files (Hive or the user) and who MR jobs execute as. The answers to these questions are very important, but I wasn't able to decipher from the JIRA how they were answered. Was one approach or another selected?
          Hide
          He Yongqiang added a comment -

          I think this jira is just a first step towards a fulfilled security feature. It just does the meta-store check to see if a given user be able to issue the query or not.
          There is no integration with HDFS/MR part. So the file owner and the job executer are just the same as now.
          A long term plan is to set up HiveServer.

          Show
          He Yongqiang added a comment - I think this jira is just a first step towards a fulfilled security feature. It just does the meta-store check to see if a given user be able to issue the query or not. There is no integration with HDFS/MR part. So the file owner and the job executer are just the same as now. A long term plan is to set up HiveServer.
          Hide
          He Yongqiang added a comment -

          New patches.
          Addressed John's comments on must-do items.
          Need to open follow up jiras for :
          1. revokeAllPrivileges should revoke role grants as well
          2. Role cycle is not being prevented
          3. GRANT should mark WriteEntity for replication etc
          4. group partitions according to Table. If the partition level privilege is disabled, this can help to perform just one check instead of using a loop for each partition.
          5. do authorization check for 'grant/revoke'

          Show
          He Yongqiang added a comment - New patches. Addressed John's comments on must-do items. Need to open follow up jiras for : 1. revokeAllPrivileges should revoke role grants as well 2. Role cycle is not being prevented 3. GRANT should mark WriteEntity for replication etc 4. group partitions according to Table. If the partition level privilege is disabled, this can help to perform just one check instead of using a loop for each partition. 5. do authorization check for 'grant/revoke'
          Hide
          Alan Gates added a comment -

          Having Hive own all the files and run all the jobs presents serious security issues since UDFs would be running code as root. This would also pose problems for Howl, as Pig and MR can't runs jobs as Hive. Maybe this isn't the right forum for this discussion. If there's a better one, let me know.

          Show
          Alan Gates added a comment - Having Hive own all the files and run all the jobs presents serious security issues since UDFs would be running code as root. This would also pose problems for Howl, as Pig and MR can't runs jobs as Hive. Maybe this isn't the right forum for this discussion. If there's a better one, let me know.
          Hide
          John Sichi added a comment -

          @Alan: we discussed this in depth at the last Hive contributor meeting:

          http://wiki.apache.org/hadoop/Hive/Development/ContributorsMeetings/HiveContributorsMinutes101025

          Let's talk to Carl about scheduling the next one and make sure we find a timeslot where you can make it.

          Show
          John Sichi added a comment - @Alan: we discussed this in depth at the last Hive contributor meeting: http://wiki.apache.org/hadoop/Hive/Development/ContributorsMeetings/HiveContributorsMinutes101025 Let's talk to Carl about scheduling the next one and make sure we find a timeslot where you can make it.
          Hide
          John Sichi added a comment -

          @Yongqiang:

          New review comments in https://reviews.apache.org/r/183/

          The patch is applying cleanly for me now (I must have forgotten to svn up), so I'll do some testing later.

          Show
          John Sichi added a comment - @Yongqiang: New review comments in https://reviews.apache.org/r/183/ The patch is applying cleanly for me now (I must have forgotten to svn up), so I'll do some testing later.
          Hide
          John Sichi added a comment -

          I added one comment about referring to "grantee" instead of "principal" in some of the API's, but I did not do it consistently. I think this would be clearer across thrift/JDO to distinguish the grantor from the grantee in all cases, but if you want to leave it as is, just ignore that comment.

          Show
          John Sichi added a comment - I added one comment about referring to "grantee" instead of "principal" in some of the API's, but I did not do it consistently. I think this would be clearer across thrift/JDO to distinguish the grantor from the grantee in all cases, but if you want to leave it as is, just ignore that comment.
          Hide
          He Yongqiang added a comment -

          A new no_thrift patch addressed John's review comments. Thanks John!

          Running tests. And will upload a new complete patch after tests (and incorporate new comments).

          Show
          He Yongqiang added a comment - A new no_thrift patch addressed John's review comments. Thanks John! Running tests. And will upload a new complete patch after tests (and incorporate new comments).
          Hide
          John Sichi added a comment -

          A few more comments on patch 10 in

          https://reviews.apache.org/r/187/

          Show
          John Sichi added a comment - A few more comments on patch 10 in https://reviews.apache.org/r/187/
          Hide
          Namit Jain added a comment -

          hive-default.xml:

          <property>
          <name>hive.variable.substitute</name>
          <value>true</value>
          <description>This enables substitution using syntax like $

          {var}

          $

          {system:var}

          and $

          {env:var}

          .</description>
          </property>

          seems like a merge problem.

          package.jdo:

          no index needed on ROLE_ID

          ALTER TABLE authorization_part SET TBLPROPERTIES ("PARTITION_LEVEL_PRIVILEGE"="TRUE");

          Dont load partition specific priviliges for tables that do no have a separate partition level priv.

          ObjectStore.java: add comments for getGrantObjects

          HiveMetaStoreClient.java: no need for setEmpotyGrantList()
          you should always create a empty list for a user,role or group.

          DefaultHiveAuthorizationProvider.java:

          Can you add comments for all the (private) functions ?
          It is not obvious what is the meaning of the return value ?

          Still reviewing.

          Show
          Namit Jain added a comment - hive-default.xml: <property> <name>hive.variable.substitute</name> <value>true</value> <description>This enables substitution using syntax like $ {var} $ {system:var} and $ {env:var} .</description> </property> seems like a merge problem. package.jdo: no index needed on ROLE_ID ALTER TABLE authorization_part SET TBLPROPERTIES ("PARTITION_LEVEL_PRIVILEGE"="TRUE"); Dont load partition specific priviliges for tables that do no have a separate partition level priv. ObjectStore.java: add comments for getGrantObjects HiveMetaStoreClient.java: no need for setEmpotyGrantList() you should always create a empty list for a user,role or group. DefaultHiveAuthorizationProvider.java: Can you add comments for all the (private) functions ? It is not obvious what is the meaning of the return value ? Still reviewing.
          Hide
          Namit Jain added a comment -

          HadoopDefaultAuthenticator

          System.out.println() present

          PrivilegeObjectDesc.java:
          @Explain(displayName="privilege subject")

          can you use Privilege Object instead ?

          private String object; -> can you change it to tableName ?

          PrivilegeObjectDesc.java: should contain a list of columns.

          Remove columns from PrivilegeDesc. -> PrivilegeDesc can be removed all together
          It is same as Privilege

          Show
          Namit Jain added a comment - HadoopDefaultAuthenticator System.out.println() present PrivilegeObjectDesc.java: @Explain(displayName="privilege subject") can you use Privilege Object instead ? private String object; -> can you change it to tableName ? PrivilegeObjectDesc.java: should contain a list of columns. Remove columns from PrivilegeDesc. -> PrivilegeDesc can be removed all together It is same as Privilege
          Hide
          Namit Jain added a comment -

          I think you can do the following optimization: feel free to do it in a followup.

          There are many queries which have lots of input partitions for the same input table.
          If the table under consideration has the same privilege for all the partitions, you
          dont need to check the permissions for all the partitions. You can find the common
          tables and skip the partitions altogether

          Show
          Namit Jain added a comment - I think you can do the following optimization: feel free to do it in a followup. There are many queries which have lots of input partitions for the same input table. If the table under consideration has the same privilege for all the partitions, you dont need to check the permissions for all the partitions. You can find the common tables and skip the partitions altogether
          Hide
          Namit Jain added a comment -

          Can you check if 'USER', 'ROLE' and 'OPTION' are not used as column names in any table ?

          Show
          Namit Jain added a comment - Can you check if 'USER', 'ROLE' and 'OPTION' are not used as column names in any table ?
          Hide
          Namit Jain added a comment -

          My bad, I committed HIVE-1840 just now.
          Can you regenerate the patch ?

          Show
          Namit Jain added a comment - My bad, I committed HIVE-1840 just now. Can you regenerate the patch ?
          Hide
          Namit Jain added a comment -

          I am getting some compilation errors - can you regenerate the patch ?

          Show
          Namit Jain added a comment - I am getting some compilation errors - can you regenerate the patch ?
          Hide
          He Yongqiang added a comment -

          refresh the patch

          Show
          He Yongqiang added a comment - refresh the patch
          Hide
          Ashutosh Chauhan added a comment -

          John's latest comment on HIVE-1696 https://issues.apache.org/jira/browse/HIVE-1696?focusedCommentId=12978176&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12978176 seems to indicate that HIVE-1696 is blocked on this getting committed. Do we know how far we are on this issue and how long it may take before it gets committed? That will help to estimate commit date for HIVE-1696

          Show
          Ashutosh Chauhan added a comment - John's latest comment on HIVE-1696 https://issues.apache.org/jira/browse/HIVE-1696?focusedCommentId=12978176&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12978176 seems to indicate that HIVE-1696 is blocked on this getting committed. Do we know how far we are on this issue and how long it may take before it gets committed? That will help to estimate commit date for HIVE-1696
          Hide
          Namit Jain added a comment -

          All the tests are passing - we are blocked on the names of the new reserved words, we have introduced.
          We are trying to get it in asap

          Show
          Namit Jain added a comment - All the tests are passing - we are blocked on the names of the new reserved words, we have introduced. We are trying to get it in asap
          Hide
          Ashutosh Chauhan added a comment -

          @Namit,

          Sounds good. Thanks for the info.

          Show
          Ashutosh Chauhan added a comment - @Namit, Sounds good. Thanks for the info.
          Hide
          He Yongqiang added a comment -

          John shared this link http://www.antlr.org/wiki/pages/viewpage.action?pageId=1741 offline to me. The new patch uses the first option in that link to solve the keyword conflict.

          Right now only did the work for keyword user and role. If needed, can open followup for others. But it will be better to try to import the second option to Hive. Or both depending on different cases. The second one is cleaner and can be default option. This can be further investigated in follow up jiras.

          Show
          He Yongqiang added a comment - John shared this link http://www.antlr.org/wiki/pages/viewpage.action?pageId=1741 offline to me. The new patch uses the first option in that link to solve the keyword conflict. Right now only did the work for keyword user and role. If needed, can open followup for others. But it will be better to try to import the second option to Hive. Or both depending on different cases. The second one is cleaner and can be default option. This can be further investigated in follow up jiras.
          Hide
          He Yongqiang added a comment -

          refresh the patch

          Show
          He Yongqiang added a comment - refresh the patch
          Hide
          He Yongqiang added a comment -

          the last patch missed a few files, and won't compile. upload a new one.

          Show
          He Yongqiang added a comment - the last patch missed a few files, and won't compile. upload a new one.
          Hide
          Namit Jain added a comment -

          Committed. Thanks Yongqiang

          Show
          Namit Jain added a comment - Committed. Thanks Yongqiang
          Show
          He Yongqiang added a comment - Here is mysql upgrade script: http://wiki.apache.org/hadoop/Hive/AuthDev#A8._Metastore_upgrade_script_for_mysql
          Hide
          Devaraj Das added a comment -

          BTW, was any thought put in to implement the authorization checks in the ObjectStore? In the model where a MetaStore server is deployed separately, applications (map/reduce tasks for example), can make programmatic calls to the MetaStore to, for example, drop random tables/partitions, and they will pass.. Just wondering whether this usecase was considered.

          Show
          Devaraj Das added a comment - BTW, was any thought put in to implement the authorization checks in the ObjectStore? In the model where a MetaStore server is deployed separately, applications (map/reduce tasks for example), can make programmatic calls to the MetaStore to, for example, drop random tables/partitions, and they will pass.. Just wondering whether this usecase was considered.

            People

            • Assignee:
              He Yongqiang
              Reporter:
              Ashish Thusoo
            • Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development