However, this OS doesn't provide a way for a user to find out what groups they're a member of from the OS's perspective.
Hadoop doesn't ultimately 'own' those resources. They come from an external source. Where does username to uid mapping click in for running tasks? All of these mappings sit outside Hadoop.
But, do you also consider having different users/groups on the client machine vs the NN to be a misconfiguration? That seems like a perfectly reasonable setup to me, and one that we should support.
Yes, very much so. It breaks a lot of different services (not just Hadoop) and is confusing to users. If one insists on this foolishness, there are other ways to solve this problem that don't involve us. This is a snow flake that could easily turn into an avalanche.
FWIW, we also allow laptops to connect directly to one of our Hadoop grids. So yes, I'm very familiar and have thought a lot about this particular problem already. That's why we have other services that allow users to see what groups they belong to, their user id, etc, etc.
Again: why do we need to provide a solution inherent in the software for ultimately is a problem that is a) much larger than our software and b) can be solved without us doing anything? Just because we can do something doesn't mean we should.
Perhaps, but Hadoop also supports making the user -> group mapping service pluggable via the hadoop.security.group.mapping configuration parameter. Why should we require implementers of this to provide a way of querying this information on their own, through some other mechanism, rather than have Hadoop show it? When a Hadoop user gets a "permission denied" error from a Hadoop command, and wants to know what groups Hadoop thinks they belong to, they'll have to run "random-command-x" rather than something simple like "hadoop fs -groups". That only seems to make Hadoop harder to use.
If someone writes a pluggable module, then this is something that needs to get factored into the cost of using that plug-in. What happens if those groups aren't in a displayable format?
Also, what happens if they aren't using the command line? Are we going to write a jsp too? This is going to quickly balloon out of control.
Hadoop assumes that file system implementations are capable of associating files and directories with users and groups, as HDFS does.
Sort of. There is no reason why a file system's implementation of users and groups couldn't be a nop. (Actually, isn't that the case for S3, Cassandra, and a few other non-POSIX-likes already?) What do we display in the case where the group is useless information for the file system in use?
My point is just that Hadoop isn't hiding this information as it stands. Hadoop makes decisions based on the groups a user belongs to, so we should make it easy for our users to find out what groups Hadoop thinks they belong to.
...except Hadoop is told what groups a user belongs to by an external source. Why shouldn't it be the responsibility of the external source to share this information? We're the consumer, not the provider when it comes to naming services.
Showing the username seems reasonable to me, and in fact the patch I'm working on displays this. Hadoop doesn't make decisions based on one's UID, so why should we show that?
I don't follow this reasoning. Kerberos doesn't have any notion of groups. But, the first component of the Kerberos principal name is used as the username when the NN and JT determine a user's groups. I don't see how we need to account for anything differently with or without Kerberos support enabled.
Look at the bigger picture and not just focus on groups for a second:
Let's say I fire my job off with a principal of user/joe. But thanks to remapping (
HADOOP-6526), the task actually gets run as username fred with a uid of 50. I access a file on the local system (or heck, even NFS) that is not readable by fred/50. Using the same logic of "oh noes users don't know their groups", we should be reporting this other information too.
This is a slippery slope and I really really don't think we want to go down this road.
(PS, some Kerberos implementations actually do pass group information along...)