I would like to call out some approaches this patch is taking explicitly, so that there's an opportunity for discussion:
1. Setting secure permissions by default vs doing so only in the LinuxTaskController
The patch sets permissions for localized job files and directories as part of the localization process itself, that is as soon as the files are localized, rather than in the TaskController. I believe this approach was taken primarily to secure permissions on the localized files on the disk ASAP. The effect of doing so is that now all configurations of hadoop, including those which do not worry about security will have secure permissions for some of the files. While this should be OK in principle, it must be noted that without the entire solution offered by the LinuxTaskController, full security is not achieved. Hence, does it make sense to make changes to the permissions only when using the LinuxTaskController (with a small window where files could have potentially insecure permissions), and leave the default case alone ?
Either way, I think the current method in which permissions are changed for the localized files causes these to be splattered across the code. It may lead to mistakes in future if somebody is making changes (say to localize a new file) and is not fully aware of the permissions issue. Ideally it should be abstracted from the process. In speaking with Vinod about this, I realized this may not be very easy to achieve the way LocalDirAllocator currently works. If we keep the changes to the task controller alone, I believe this will be very easy to accomplish.
2. Approach for sharing common files between TT and child
The patch juggles around with permissions on the localized and intermediate files by first setting ownership to the job owner, and after the task is complete, changing the ownership back to the tasktracker user. The reason it does this, is for two purposes:
– To enable intermediate outputs to be served by the tasktracker, once the child has created them.
– To allow cleanup as the tasktracker once the tasks are done.
Note that change of ownership is done by launching the task-controller setuid binary as root privileges are required for the purpose.
The above two problems can be solved in another way as well (and this is what was discussed on
HADOOP-4490). In that approach, we'd thought of moving the intermediate outputs to a different directory owned by the tasktracker, and we'd thought of removing files and directories as the user, as part of the cleanup thread. I think one advantage this alternative approach brings is that it gives an opportunity to not launch the task-controller for reduces and hence slots can become free sooner. This does mean the cleanup might take more time, but that happens asynchronously.
On the other hand, the current model seems to be simpler to fit into mind. If a task is not complete, its files are owned by the job owner, if not, they are owned by TT. It is easy to check these rules on a running system. So, is it worth the change in approach to bring in a little optimization (which might actually not matter that much).
3. Sandboxing task runtime environment by changing values of mapred.local.dir for the child
The patch 'sandboxes' the task runtime environment, by changing the mapred.local.dir values from the original configured values to $
and pass the new values to the child. Vinod told me that this was primarily required because checks in the LocalDirAllocator require the current process to have write permissions on the context passed to it (which is basically the mapred local directories). When the child calls LocalDirAllocator, it would now fail because the original mapred local directories will not have write permissions to world. Hence, the need to sandbox. Of course, it is also probably more secure.
Two questions this raises: Is this change in contract acceptable ? If yes, is it acceptable in the default case as well, or should it be restricted to the case of LinuxTaskController alone.
Either way, the current patch, in order to implement this behavior juggles around the values of the mapred.local.dir variable in conf file at 2-3 locations in the tasktracker code. I think that approach is error prone and needs to change.
Thoughts on these three points ?