Issue Details (XML | Word | Printable)

Key: HADOOP-4656
Type: Improvement Improvement
Status: Open Open
Priority: Major Major
Assignee: Boris Shkolnik
Reporter: Arun C Murthy
Votes: 0
Watchers: 13
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Add a user to groups mapping service

Created: 13/Nov/08 10:15 PM   Updated: Saturday 12:43 AM
Component/s: security
Affects Version/s: 0.19.0
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-4656.patch 2009-10-08 08:04 AM Arun C Murthy 14 kB
Text File Licensed for inclusion in ASF works HADOOP-4656_0_20090108.patch 2009-01-08 09:16 AM Arun C Murthy 7 kB
Issue Links:
Blocker
 
Reference
 


 Description  « Hide
Currently the IPC client sends the UGI which contains the user/group information for the Server. However this represents the groups for the user on the client-end. The more pertinent mapping from user to groups is actually the one seen by the Server. Hence the client should only send the user and we should add a 'group mapping service' so that the Server can query it for the mapping.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Arun C Murthy added a comment - 14/Nov/08 05:38 PM
HADOOP-4348 is switching IPC to use the JAAS Subject rather than UGI (which will become an internal artifact). While we are adding the user-to-group mapping service, I propose we change the IPC Client to send the JAAS Subject in the header rather than UGI, this will also be compatible with the way we will do Kerberos-based authentication via the GSS API.

Arun C Murthy added a comment - 08/Jan/09 08:01 AM
I propose a new abstract class Groups with a single method 'getGroups' as below:
Groups.java
public abstract class Groups {
  List<String> getGroups(String username);
}

with a concrete implementation which gets the unix groups for the given user.


Arun C Murthy added a comment - 08/Jan/09 09:16 AM
Preliminary patch while I continue testing.

Kan Zhang added a comment - 08/Jan/09 06:32 PM
> I propose we change the IPC Client to send the JAAS Subject in the header rather than UGI, this will also be compatible with the way we will do Kerberos-based authentication via the GSS API.

Just want to clarify that application code doesn't send anything when using Kerberos. It's all hiding inside the GSS API library. After authentication, server can query the established GSS context to get client ID as GSSName which can be converted to a String. So for compatibility, IPC Client doesn't have to send JAAS Subject in the header. Send a String is fine.


Allen Wittenauer added a comment - 12/Jan/09 07:51 PM
Groups should definitely come from asking the host OS in some form using the Java equivalent of getgrent() and friends. [ Be aware that getgroups() is BSD-specific and may not be available on System V, such as Solaris and HP-UX.] Doing this via shell call out is just going to exasperate the memory problems we already see, especially on the secondary name node that requires more memory than the primary due to the fork of whoami/id!

It also opens up yet another security hole where any random groups command on the name nodes path can be used to override. Not Good(tm).


Allen Wittenauer added a comment - 13/Jan/09 05:37 PM
Privately, someone asked about caching the group content.

One of the big advantages of talking to the OS is that many systems include a naming services caching daemon that handles caching group and similar content for the entire machine. nscd generally includes great support for controlling the size, ttl, negative ttl, etc, for the cache. Duplicating that functionality seems like overkill and, worse, will act as a cache against a cache!


Kan Zhang added a comment - 26/Apr/09 10:30 PM
Arun, can we get this one done soon? I'm working on 4343, which depends on this. Thanks.

FROHNER Ákos added a comment - 14/Sep/09 03:51 PM
Please consider passing the authentication context to the getGroups() method,
as it might be easier to retrieve the associated groups using that information,
then based only on the username.

For example in POSIX environments it is faster to do a lookup based on the
numeric UID, than based on the username.

If you are using Kerberos with PAC, then the authentication context may already
contain a list of associated groups:
http://k5wiki.kerberos.org/wiki/Projects/PAC_and_principal_APIs

There is a similar solution based on X509 authentication, where the associated
list of groups is embedded into the authentication context.


Allen Wittenauer added a comment - 14/Sep/09 05:59 PM
AFAIK, Hadoop has no concept of uid, as everything in the HDFS, etc, is stored as a string. So the username is about all the context you can probably get.

Arun C Murthy added a comment - 08/Oct/09 08:04 AM
Preliminary patch, with some testing done.

Boris Shkolnik added a comment - 21/Nov/09 12:43 AM
This patch will create two versions of SecurityUtil.getSubject. One that builds list of group principles from UGI group list and another one that builds the list from UNIX id command. Do we really need the first one? I suggest we remove it.