|
abstract class QueueAccessVerifier { enum QueueOperation { SUBMIT_JOB, LIST_JOBS, ADMINISTER_JOBS // .. } private Configuration conf; public void setConfiguration(Configuration conf) { this.conf = conf; } public Configuration getConfiguration() { return conf; } // basic initialization, for e.g. read up a conf file which has the list // of configured groups and users, and so on. public abstract void initialize(); // verify if the specified operation is allowed on the specified queue public abstract boolean verifyAccess(QueueOperation operation, String queue); }
Please let me know if this approach seems fine. I'd suggest keeping the implementation concrete until we have a more complete story for security. I'd suggest:
class QueueAccessVerifier extends Configured { enum QueueOperation { SUBMIT_JOB, ADMINISTER_JOBS // TODO Add LIST_JOBS when we have a http security story } // verify if the specified operation is allowed on the specified queue public boolean verifyAccess(QueueOperation operation, String queue, UserGroupInformation user) throws IOException {..} } Thinking bit more about this, I am wondering where the responsibility of managing ACLs should be. There seem to be two choices:
I am thinking that the second option is better. Logically queues do seem to be related closer to schedulers. Comments ? Hemant wrote:
> * An ACL can be specified as an (operation, userlist or grouplist) tuple for a given queue. > * The operations that seem relevant are: > o submit job > o administer jobs in a queue - including deleting any job or modifying any job's properties such as it's priority > o listing jobs We need add permissions to
The changeACL is a new operation - the users who are permitted this are effectively the queue owners.
Yes, this is what I meant by 'configuring a queue'. It means changing queue properties. In internal discussions, the consensus was that this operation will be controlled only by administrators of the system itself - and not delegated to others by setting up access control.
In the same discussions, it seemed like sysadmins would like to control this themselves as well. Again, there did not seem a need to set up access control. That said, it is really simple to add new operations in the design above when we feel we need it. That's one advantage with this approach.
This was indeed the option we considered at first. There were a couple of reasons, we chose the model I proposed over the file system ACLs model:
Hmm... it seems to me that a person who needs to do one of these, will in a different situation need to do the other as well. For instance, a job-admin might be asked to bump up the priority of a job, as it is realized that it needs to complete before others. This same admin might be asked to delete a job because it is stuck in front of the queue without progress and impeding the jobs after it. Can you explain your use-case about why you think these should be different ? I think that we should have the security in the framework rather than the scheduler. I think that will provide a better basis for security and prevent schedulers from having to deal with security, which would be problematic and error-prone.
I think it should be with the JobTracker. Queues are first-class citizens across the system. JT's submitJob() should look at access control - the job shouldn't even reach the scheduler unless it passes access control. Agree with Owen that we don't need Schedulers to deal with security. It's OK for both the JT and Schedulers to read queue-based information. Both can make use of the ResourceManagerConf class. There seems to be a consensus that managing ACLs should be with the JobTracker. This is Option 1 which I've mentioned above. I'm OK taking this route.
Like I mentioned in that option, this would mean that queue configuration would be split in two places
The simplest way of mapping the generic configuration with scheduler specific one is to link them using the queue name in some manner. In Considering these, I think we have two options regarding where to specify queues, and scheduler specific queue configuration:
To reflect the generic nature of queues, it seems better to take the second option. But I am open to comments. Hemant says:
>In the same discussions, it seemed like sysadmins would like to control this themselves as well. Again, there did not seem a need to set up access control. What do you mean by sysadmins? - do they have special priviledges beyond the ACL. I think it makes sense for the superuser and the supergroup (of HDFS) to have full permissions to job queues without the need for their username/groupname appear in the ACL. Is that what you meant by sysadmins? >That said, it is really simple to add new operations in the design above when we feel we need it. That's one advantage with this approach.
I'd favor the first option. Placing all the queue configuration in one file (resource-manager-conf.xml) makes it easier for someone to configure queues. They shouldn't care about what configuration is scheduler-specific and what is not. The individual components (JobTracker, Scheduler) can read whatever they need from this one file, or rather, from the one class that represents this file. You don't want this split (each component reading what it wants) to necessarily show up in the config file. If you move code around later, you'd end up moving config values between various files, which is not good.
Sanjay, yes. This is what I mean by sysadmins. They have complete control over the system, without needing special access control to be set up for them.
Is this because you believe that this operation can be delegated to others users/groups ? Also, for this version of the resource manager, we don't have a UI for administering the queues. In other words, even if we declare the operation right now, there is going to be no place where we need to check this. I suggest we add this as a TODO comment to the Enum I've declared above, on the lines of what Owen proposed for listjobs. Makes sense ? Assuming that queues will become part of the framework, I would like to propose the following mechanism to handle queue configuration. The requirements are:
This does have a disadvantage that queue names and their scheduler configuration are at different places, but that seems like a tradeoff required to keep some parts of the queue administration in the core framework itself. Does this sound right ? If yes, I propose to make the following changes:
Comments, please. my earlier comments were not quite right. Hemanth's point is valid: there is some queue configuration that is generic across all queues. This includes queue names and ACLs, for now. Then there is queue configuration that is different for different schedulers. We should put these two in separate files, though they're tied together by queue names, which need to be common across both.
Should the generic queue config be in resource-manager-conf or in hadoop-default/hadoop-site? Part of the reason for keeping queue-specific stuff in a separate file was to isolate changes to haddop-default, while we figure out a more generic way to do queue config (for which HADOOP-3579 was opened). That reasoning still sounds valid, so the generic queue config stuff should be in resource-manager-conf till we resolve 3579. I'd prefer just using hadoop-site.xml for the generic properties.
I'd suggest renaming the ResourceManagerConf to something like QueueManager and merging in QueueAccessVerifier. It would look something like: class QueueManager {
QueueManager(Configuration conf) { ... }
List<String> getQueueNames() { ... }
static enum QueueOperation { SUBMIT_JOB, ADMINISTER_JOBS }
boolean hasAccess(String queueName, UserGroupInformation user, QueueOperation action) { ... }
}
+1 for renaming ResourceManagerConf to QueueManager. The earlier name had a connotation that it was related to the Configuration or JobConf classes, while it was not.
I am not too keen on where the generic properties are defined. So, its a 1:1 vote for now.. will look to see if others have any specific preference. We also need: void refresh(Configuration conf); added to QueueManager. Just a minor comment here that while queues are considered part of the framework, they still are related to the scheduler that is configured. For e.g. the default scheduler JobQueueTaskScheduler does not support more than one queue. So, if someone configures two queue names, and leaves the scheduler as the default one, that might not work as expected. Probably the documentation of the queue names' configuration parameter should document that there is a relationship. Thanks to Sreekanth for pointing this out.
I am attaching a patch for early review. It isn't complete, and is more to illustrate how the implementation is going to look. Please share your comments.
The patch includes:
Tested this patch manually for job submission, and killing of job and tasks and it is working fine. Things to do:
Things to do, possibly outside this patch:
A more complete patch. Has unit tests and javadocs. Request to provide comments on the implementation.
Since the QueueManager is the class being used by the framework, some changes are required to ResourceManagerConf (used only by The JobClient should use UserGroupInformation.login(conf) rather than UnixUserGroupInformation.login(conf).
I think we should have an exception class to catch for login failures. So we either need to add LoginException to submitJob in JobClient or we should create a new IOException that is for login failures and put it into org.apache.hadoop.security. I'd lean toward changing the signature of run/submitJob. I'd suggest handling mapred.acls.enabled inside of the QueueManager instead of the JobTracker. The same for giving the user access to their own jobs. That should be done in QueueManager, so that all of the rules & exceptions for allowing actions are contained inside the one class. The JobTracker should use the standard method for getting the queue of the job instead of using the attribute name "queue.name". Why does the TaskScheduler need the QueueManager? That seems really unexpected. New patch incorporating review comments.
I had an offline discussion with Owen on his review comments. Here's the summary, and explanation of the changes in the patch with respect to these.
The JobClient code was already using UnixUserGroupInformation.login(conf, saveToConf). There was no API equivalent to this one in UserGroupInformation. Discussing options, we decided it's best to have a new JIRA to address this issue in UserGroupInformation. I will open the JIRA and link it off here. Until then, because this issue is holding up other JIRAs, we decided to retain the current code as is.
Because run/submitJob throwing a new Exception would cause backwards incompatibility, we decided to go with the other option, that is create a sub-class of IOException in org.apache.hadoop.security. Looking at it, I've seen there's already an AccessControlException in org.apache.hadoop.fs.permission. Since both classes are identical, I think we should leave one (possibly the one being defined here), and deprecate and ultimately replace the other. This could be part of the cleanup JIRA being opened above. Thoughts ?
Done this. I've defined a new API in QueueManager to handle this. boolean hasAccess(String queueName, JobInProgress job, UserGroupInformation user, QueueOperation action) { ... }
This was being done in
This was done as discussed in Minor changes to earlier patch - mainly added 2 additional tests to test submitting jobs to non default queues, and also included the queue name property in hadoop-default.xml.
This is looking good. Pushing through HadoopQA.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12389299/HADOOP-3698.patch against trunk revision 691306. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3163/testReport/ This message is automatically generated. Thanks for the verification, Owen. The attached patch fixes the test failures. One of the things I realized is that the name AccessControlException was clashing in some test cases with the class of the same name in org.apache.hadoop.fs.permissions package, and it seemed dangerous to include the same named class. So, I've modified the name to AccessControlIOException, which seems fair as it extends from an IOException.
Running through Hudson again.
Hudson did not pick this patch up. Adding again.
Output of ant test-patch:
[exec] +1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] +1 tests included. The patch appears to include 9 new or modified tests. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. I also ran ant test and ant contrib-test. There was one failure for TestKosmosFileSystem. But that was because of I just committed this. Thanks, Hemanth!
Integrated in Hadoop-trunk #595 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/595/
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Firstly, as mentioned in http://issues.apache.org/jira/browse/HADOOP-3479?focusedCommentId=12602243#action_12602243
, it possibly makes sense to make the access control mechanism something pluggable at a class design level.
One concrete and simple mechanism is as follows:
Currently, administering the queue itself (that is - operations such as adding / deleting or configuring a queue) is an admin only operation which is controlled by means of access to the queue configuration file defined in
HADOOP-3479.Thoughts ?