HADOOP-4490, files and archives localized as part of distributed cache used to be given executable permissions. I suppose an assumption was that the directories and files created during this localization process had read permissions automatically for the owner of the files. And since the owner of the files, basically the user tasktracker is running as, was also the owner of the task process, this was sufficient to access the cache files.
HADOOP-4490, we had a situation where the tasktracker and the task could run as different users. The tracker localizes the files and the task needs to access the files. So at a minimum, read and execute permissions on directories and files to others needed to be granted. As mentioned in the comment linked above, a choice was made to recursively set these permissions on all files starting from the base directory - a performance problem as observed on clusters with a very, very large number of localized cache files.
MAPREDUCE-856, to solve the requirement of securing access to the distributed cache files, the local directory structure was changed to be per user. Further, in the LinuxTaskController, ownership and permissions were set for all files under a user's archive folder to the user and providing access only to that user. For the DefaultTaskController, the same changes as made in HADOOP-4490 were retained, though it was possibly unnecessary.
First, to revisit if we need any permission setting for distributed cache files:
I think this is still required. For the DefaultTaskController, executable permissions need to be set on the localized files as in the pre-
HADOOP-4490 days. For the LinuxTaskController, we need to change ownership and set permissions in the task controller for that user.
However, in both cases, I suppose we only need to set permissions for files that are actually copied from DFS to the local file system (including any directories created in this process). This will address the issue raised in this JIRA.