HDFS has its own authorization. So if we allow an access in Hive layer and pass this access to HDFS (by setting the correct hdfs username and groups), the job can fail with HDFS permission problem.
So need to solve the problem from 2 layer independent authorization.
One way to allow all accesses to HDFS, and let hive do the authorization. So hive runs as root in terms of HDFS.
The other way is to plug in HDFS authorization to Hive layer, and only accept one access if both of Hive and HDFS say YES. A user belongs to different unix groups, and set hdfs permission based on the unix group. [ I am not sure about how many groups a user can have in terms of HDFS. I mean how many group settings you can put to a hdfs file. Let's simply say i want these 2 groups to be able to read the file.] The another problem is the column level privileges.
This is very open for discussion, please comment on it.
About the proposal, there is one authorization rule that we are not sure about. It's the simple rule: one deny then deny.
Let's say this example:
5.3.1 I want to grant everyone (new people may join at anytime) to db_name.*, and then later i want to protect one table db_name.T from ALL users but a few
1) Add all users to a group 'users'. (assumption: new users will automatically join this group). And grant 'users' ALL privileges to db_name.*
2) Add those few users to a new group 'users2'. AND REMOVE them from 'users'
3) DENY 'users' to db_name.T
4) Grant ALL on db_name.T to users2
The main problem in this approach is that "REMOVE them from 'users'" is not practicable.
The other options that we have thought about is another rule.
First try user name:
first try to deny this access by look up the deny tables by user name:
1. If there is an entry in 'user' that deny this access, return DENY
2. If there is an entry in 'db' that deny this access, return DENY
3. If there is an entry in 'table' that deny this access, return DENY
4. If there is an entry in 'column' that deny this access, return DENY
If we got one deny, will return deny for this attempt.
if deny failed, go through all privilege levels with the user name:
5. If there is an entry in 'user' that accept this access, return ACCEPT
6. If there is an entry in 'db' that accept this access, return ACCEPT
7. If there is an entry in 'table' that accept this access, return ACCEPT
8. If there is an entry in 'column' that accept this access, return ACCEPT
Second try the user's group/role names one by one until we get an ACCEPT. If we get an ACCEPT from one group/role, will ACCEPT this access. Else deny.
For each role/group, we do the same routine as we did for user name.
The problem with this approach is it's a little bit complex and we did not find any system that use this. For mysql, there is no deny. For sql server, it's one deny then deny.