I've updated the patch to trunk, incorporating most of Arun's comments above. Arun, can you please take a look.
We should use mapred.local.dir instead of hadoop.tmp.dir in LinuxTaskController.
Use Path's methods instead of String manipulation for all path-related manipulations.
Pass mode, user/group to DistributedCache rather than rely on the newly introduced DistributedCache.isFreshlyLoaded which is then unnecessary.
Done. I've added a new overloaded API that passes the information to DistributedCache. Just to keep options open, I've defined a new public class DistributedCacheFileAccessInfo - a simple class that can be used to define permissions and ownership information for localized files in DistributedCache. Can you take a specific look at this, and let me know if this looks OK ?
Move setting up of JVM-specific files e.g. task's log directory to TaskController.launchJVM
I've not done this one alone. It was not very clear what information is necessary at launch time. For e.g. if there are some localized files under the task cache directory that need to be loaded at launch time, we'll need permissions for these also. In general, it seemed a little risky to launch the JVM without giving full access to all jars etc, even if the Task will start running later only. So, I've left this as is. I think the main concern here was about the special check I had in JvmManager where I was avoiding setting the permissions again when getting the task to launch. This seems a simple enough check, and I've documented the rationale in code. Can you verify this again, and let me know your thoughts ?