|
I propose a new abstract class Groups with a single method 'getGroups' as below:
Groups.java public abstract class Groups { List<String> getGroups(String username); } with a concrete implementation which gets the unix groups for the given user. Preliminary patch while I continue testing.
> I propose we change the IPC Client to send the JAAS Subject in the header rather than UGI, this will also be compatible with the way we will do Kerberos-based authentication via the GSS API.
Just want to clarify that application code doesn't send anything when using Kerberos. It's all hiding inside the GSS API library. After authentication, server can query the established GSS context to get client ID as GSSName which can be converted to a String. So for compatibility, IPC Client doesn't have to send JAAS Subject in the header. Send a String is fine. Groups should definitely come from asking the host OS in some form using the Java equivalent of getgrent() and friends. [ Be aware that getgroups() is BSD-specific and may not be available on System V, such as Solaris and HP-UX.] Doing this via shell call out is just going to exasperate the memory problems we already see, especially on the secondary name node that requires more memory than the primary due to the fork of whoami/id!
It also opens up yet another security hole where any random groups command on the name nodes path can be used to override. Not Good(tm). Privately, someone asked about caching the group content.
One of the big advantages of talking to the OS is that many systems include a naming services caching daemon that handles caching group and similar content for the entire machine. nscd generally includes great support for controlling the size, ttl, negative ttl, etc, for the cache. Duplicating that functionality seems like overkill and, worse, will act as a cache against a cache! Please consider passing the authentication context to the getGroups() method,
as it might be easier to retrieve the associated groups using that information, then based only on the username. For example in POSIX environments it is faster to do a lookup based on the If you are using Kerberos with PAC, then the authentication context may already There is a similar solution based on X509 authentication, where the associated AFAIK, Hadoop has no concept of uid, as everything in the HDFS, etc, is stored as a string. So the username is about all the context you can probably get.
Preliminary patch, with some testing done.
This patch will create two versions of SecurityUtil.getSubject. One that builds list of group principles from UGI group list and another one that builds the list from UNIX id command. Do we really need the first one? I suggest we remove it.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
HADOOP-4348is switching IPC to use the JAAS Subject rather than UGI (which will become an internal artifact). While we are adding the user-to-group mapping service, I propose we change the IPC Client to send the JAAS Subject in the header rather than UGI, this will also be compatible with the way we will do Kerberos-based authentication via the GSS API.