Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-563 Security features for Map/Reduce
  3. MAPREDUCE-842

Per-job local data on the TaskTracker node should have right access-control

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Modified TaskTracker and related classes so that per-job local data on the TaskTracker node has right access-control. Important changes:
       - All files/directories of the job on the TaskTracker are now user-owned by the job-owner and group-owner by a special TaskTracker's group.
       - The permissions of the file/directories are set to the most restrictive permissions possible.
       - Files/dirs shareable by all tasks of the job on this TT are set proper access control as soon as possible, i.e immediately after job-localization and those that are private to a single task are set access control after the corresponding task's localization.
       - Also fixes MAPREDUCE-131 which is related to a bug because of which tasks hang when the taskcontroller.cfg has multiple entries for mapred.local.dir
       - A new configuration entry hadoop.log.dir corresponding to the hadoop.log.dir in TT's configuration is now needed in task-controller.cfg so as to support restricted access control for userlogs of the tasks on the TaskTracker.
      Show
      Modified TaskTracker and related classes so that per-job local data on the TaskTracker node has right access-control. Important changes:  - All files/directories of the job on the TaskTracker are now user-owned by the job-owner and group-owner by a special TaskTracker's group.  - The permissions of the file/directories are set to the most restrictive permissions possible.  - Files/dirs shareable by all tasks of the job on this TT are set proper access control as soon as possible, i.e immediately after job-localization and those that are private to a single task are set access control after the corresponding task's localization.  - Also fixes MAPREDUCE-131 which is related to a bug because of which tasks hang when the taskcontroller.cfg has multiple entries for mapred.local.dir  - A new configuration entry hadoop.log.dir corresponding to the hadoop.log.dir in TT's configuration is now needed in task-controller.cfg so as to support restricted access control for userlogs of the tasks on the TaskTracker.
    1. HADOOP-4491-20090623-common.1.txt
      24 kB
      Vinod Kumar Vavilapalli
    2. HADOOP-4491-20090623-mapred.1.txt
      104 kB
      Vinod Kumar Vavilapalli
    3. HADOOP-4491-20090703.txt
      139 kB
      Vinod Kumar Vavilapalli
    4. HADOOP-4491-20090703-common.txt
      23 kB
      Vinod Kumar Vavilapalli
    5. HADOOP-4491-20090703.1.txt
      136 kB
      Vinod Kumar Vavilapalli
    6. HADOOP-4491-20090703-common.1.txt
      23 kB
      Vinod Kumar Vavilapalli
    7. HADOOP-4491-20090707-common.txt
      11 kB
      Vinod Kumar Vavilapalli
    8. HADOOP-4491-20090707.txt
      137 kB
      Vinod Kumar Vavilapalli
    9. HADOOP-4491-20090716-mapred.txt
      134 kB
      Vinod Kumar Vavilapalli
    10. HADOOP-4491-20090803.txt
      171 kB
      Vinod Kumar Vavilapalli
    11. HADOOP-4491-20090803.1.txt
      170 kB
      Vinod Kumar Vavilapalli
    12. HADOOP-4491-20090806.txt
      190 kB
      Vinod Kumar Vavilapalli
    13. HADOOP-4491-20090807.2.txt
      194 kB
      Vinod Kumar Vavilapalli
    14. HADOOP-4491-20090810.1.txt
      201 kB
      Vinod Kumar Vavilapalli
    15. HADOOP-4491-20090810.3.txt
      201 kB
      Vinod Kumar Vavilapalli
    16. HADOOP-4491-20090811.txt
      205 kB
      Vinod Kumar Vavilapalli
    17. HADOOP-4491-20090812.txt
      205 kB
      Vinod Kumar Vavilapalli
    18. 842.20S-4.patch
      208 kB
      Vinod Kumar Vavilapalli
    19. MR-842-follow-up.patch
      4 kB
      Hemanth Yamijala

      Issue Links

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #285 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/285/)
        MAPREDUCE-1635. ResourceEstimator does not work after MAPREDUCE-842. Contributed by Amareshwari Sriramadasu.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #285 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/285/ ) MAPREDUCE-1635 . ResourceEstimator does not work after MAPREDUCE-842 . Contributed by Amareshwari Sriramadasu.
        Hide
        Hemanth Yamijala added a comment -

        Uploading a follow-up patch for earlier hadoop versions that needs to be applied on top of backported patch for MAPREDUCE-181 and the 842.20S-4.patch. This patch fixes localization of the job token file to have correct permissions.

        Show
        Hemanth Yamijala added a comment - Uploading a follow-up patch for earlier hadoop versions that needs to be applied on top of backported patch for MAPREDUCE-181 and the 842.20S-4.patch. This patch fixes localization of the job token file to have correct permissions.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Patch against Hadoop 0.20 Yahoo! distribution. Not for commit here.

        Show
        Vinod Kumar Vavilapalli added a comment - Patch against Hadoop 0.20 Yahoo! distribution. Not for commit here.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #47 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/47/)
        . Setup secure permissions for localized job files, intermediate outputs and log files on tasktrackers. Contributed by Vinod Kumar Vavilapalli.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #47 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/47/ ) . Setup secure permissions for localized job files, intermediate outputs and log files on tasktrackers. Contributed by Vinod Kumar Vavilapalli.
        Hide
        Hemanth Yamijala added a comment -

        I just committed this. Thanks, Vinod !

        Show
        Hemanth Yamijala added a comment - I just committed this. Thanks, Vinod !
        Hide
        Hemanth Yamijala added a comment -

        Apart from the documented contrib tests, the core test failure is MAPREDUCE-441. Going to commit this patch now.

        Show
        Hemanth Yamijala added a comment - Apart from the documented contrib tests, the core test failure is MAPREDUCE-441 . Going to commit this patch now.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12416292/HADOOP-4491-20090812.txt
        against trunk revision 803231.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 50 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/471/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/471/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/471/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/471/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416292/HADOOP-4491-20090812.txt against trunk revision 803231. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 50 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/471/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/471/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/471/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/471/console This message is automatically generated.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        I just ran the test cases that touch the changes in the last patch. I also generated docs and verified the documentation changes.

        The contrib test failures reported by Hudson are unrelated. Plz see MAPREDUCE-848 and MAPREDUCE-699.

        This patch is committable.

        Show
        Vinod Kumar Vavilapalli added a comment - I just ran the test cases that touch the changes in the last patch. I also generated docs and verified the documentation changes. The contrib test failures reported by Hudson are unrelated. Plz see MAPREDUCE-848 and MAPREDUCE-699 . This patch is committable.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Attaching a patch that

        • fixes the misspell of LOGFILE
        • fixes the documentation to reflect the most secure permissions for taskcontroller binary as (6050 root mapred) and the taskcontroller.cfg file as (400 owned by mapred)
        • solves a minor problem with the binary code on RHEL5. Looked like fts_read function from fts library isn't behaving exactly the same way like it does on my ubuntu system and is returning junk files after the whole tree is traversed. Worked around this by exiting manually when all the files in a directory tree are traversed.
        Show
        Vinod Kumar Vavilapalli added a comment - Attaching a patch that fixes the misspell of LOGFILE fixes the documentation to reflect the most secure permissions for taskcontroller binary as (6050 root mapred) and the taskcontroller.cfg file as (400 owned by mapred) solves a minor problem with the binary code on RHEL5. Looked like fts_read function from fts library isn't behaving exactly the same way like it does on my ubuntu system and is returning junk files after the whole tree is traversed. Worked around this by exiting manually when all the files in a directory tree are traversed.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12416158/HADOOP-4491-20090811.txt
        against trunk revision 803049.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 50 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/466/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/466/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/466/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/466/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416158/HADOOP-4491-20090811.txt against trunk revision 803049. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 50 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/466/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/466/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/466/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/466/console This message is automatically generated.
        Hide
        Hemanth Yamijala added a comment -

        +1 for the changes, except a minor nit: there's a typo for the 'LOGFILE' as 'OGFILE' under the #ifdef DEBUG flag in the task-controller code.

        Show
        Hemanth Yamijala added a comment - +1 for the changes, except a minor nit: there's a typo for the 'LOGFILE' as 'OGFILE' under the #ifdef DEBUG flag in the task-controller code.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Updated patch fixing the testcase problem and findbugs warnings.

        Show
        Vinod Kumar Vavilapalli added a comment - Updated patch fixing the testcase problem and findbugs warnings.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12416038/HADOOP-4491-20090810.3.txt
        against trunk revision 802954.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 47 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 5 new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/463/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/463/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/463/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/463/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416038/HADOOP-4491-20090810.3.txt against trunk revision 802954. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 47 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/463/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/463/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/463/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/463/console This message is automatically generated.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Updated patch that

        • Downgrades some of the log statements from info to DEBUG level, both in JAVA as well as the c code
        • Removes unused exit codes, macros and method declarations
        • Fixes some of the glitches in Forrest documentation
        Show
        Vinod Kumar Vavilapalli added a comment - Updated patch that Downgrades some of the log statements from info to DEBUG level, both in JAVA as well as the c code Removes unused exit codes, macros and method declarations Fixes some of the glitches in Forrest documentation
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12416027/HADOOP-4491-20090810.1.txt
        against trunk revision 802224.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 47 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/596/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416027/HADOOP-4491-20090810.1.txt against trunk revision 802224. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 47 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/596/console This message is automatically generated.
        Hide
        Hemanth Yamijala added a comment -

        Can we also weed out trace type log statements both in the the C and Java code, move them to a debug mode if needed ?

        Show
        Hemanth Yamijala added a comment - Can we also weed out trace type log statements both in the the C and Java code, move them to a debug mode if needed ?
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Attaching patch including most of the above comments.

        hadoop.log.dir in task-controller.cfg is not needed?

        hadoop.log.dir is still needed to set permissions on userlogs/attempt-id directories to be writable by the user's tasks and readable by TT for serving via web UI.

        One more point which I discussed with Vinod. The changes in common are at this point just a simple convenience wrapper API around the java.io.File APIs for setting permissions. I propose we move this into the mapred patch itself and open a JIRA to discuss the changes to common more carefully.

        Moved these changes into TaskTracker.PermissionsHandler. In future (via MAPREDUCE-759), this along with most of the remaining localization related code should be refactored into a separate class.

        Show
        Vinod Kumar Vavilapalli added a comment - Attaching patch including most of the above comments. hadoop.log.dir in task-controller.cfg is not needed? hadoop.log.dir is still needed to set permissions on userlogs/attempt-id directories to be writable by the user's tasks and readable by TT for serving via web UI. One more point which I discussed with Vinod. The changes in common are at this point just a simple convenience wrapper API around the java.io.File APIs for setting permissions. I propose we move this into the mapred patch itself and open a JIRA to discuss the changes to common more carefully. Moved these changes into TaskTracker.PermissionsHandler. In future (via MAPREDUCE-759 ), this along with most of the remaining localization related code should be refactored into a separate class.
        Hide
        Hemanth Yamijala added a comment -

        One more point which I discussed with Vinod. The changes in common are at this point just a simple convenience wrapper API around the java.io.File APIs for setting permissions. I propose we move this into the mapred patch itself and open a JIRA to discuss the changes to common more carefully.

        Show
        Hemanth Yamijala added a comment - One more point which I discussed with Vinod. The changes in common are at this point just a simple convenience wrapper API around the java.io.File APIs for setting permissions. I propose we move this into the mapred patch itself and open a JIRA to discuss the changes to common more carefully.
        Hide
        Hemanth Yamijala added a comment -

        Getting close. A few minor points:

        • Unrelated to this patch, in Child, we are adding the task's configuration file as a resource to the defaultConf of the Child. This is used only to get the proxy, one time when the Child starts (before the config file is added). So, it appears the adding of config file to the defaultConf variable can be removed.
        • The latest patch is not initializing the task attempt directories on all disks. This would mean the task may need to create them at runtime under the job directory. We wanted to the job directory to not have writable permissions for the user. Would this not be a problem ?
        • In the C code, the constant used for R and X for users is S_IREAD and S_IEXEC. These seem to be obsoleted in favor of S_IRUSR and S_IXUSR. Can we use those instead ?
        • hadoop.log.dir in task-controller.cfg is not needed ?
        • The test case for testing the permissions of the localized files and directories is very good. Thanks !
        • secure_path, change_owner and change_mode can be marked static in task-controller.c as these are somewhat core to the security work in the patch, and therefore need not be exposed.

        I also looked at the Forrest documentation and the javadocs:

        • NIT: Typo in TaskTracker.initializeJobDirs() javadoc - 'alter' should be 'later'
        • NIT: In javadocs, wherever we say 'download from DFS' should probably be 'download from FS', as it could be local also.
        • C documentation - needs update for initialize_job, prepare_task_logs
        Show
        Hemanth Yamijala added a comment - Getting close. A few minor points: Unrelated to this patch, in Child, we are adding the task's configuration file as a resource to the defaultConf of the Child. This is used only to get the proxy, one time when the Child starts (before the config file is added). So, it appears the adding of config file to the defaultConf variable can be removed. The latest patch is not initializing the task attempt directories on all disks. This would mean the task may need to create them at runtime under the job directory. We wanted to the job directory to not have writable permissions for the user. Would this not be a problem ? In the C code, the constant used for R and X for users is S_IREAD and S_IEXEC. These seem to be obsoleted in favor of S_IRUSR and S_IXUSR. Can we use those instead ? hadoop.log.dir in task-controller.cfg is not needed ? The test case for testing the permissions of the localized files and directories is very good. Thanks ! secure_path, change_owner and change_mode can be marked static in task-controller.c as these are somewhat core to the security work in the patch, and therefore need not be exposed. I also looked at the Forrest documentation and the javadocs: NIT: Typo in TaskTracker.initializeJobDirs() javadoc - 'alter' should be 'later' NIT: In javadocs, wherever we say 'download from DFS' should probably be 'download from FS', as it could be local also. C documentation - needs update for initialize_job, prepare_task_logs
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Updated patch changing the forrest documentation to reflect the changes.

        Show
        Vinod Kumar Vavilapalli added a comment - Updated patch changing the forrest documentation to reflect the changes.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Attaching patch that includes the changes suggest in the review. Also, has an added test that verifies permissions set on files at various steps. Running tests with the linux task-controller now.

        Show
        Vinod Kumar Vavilapalli added a comment - Attaching patch that includes the changes suggest in the review. Also, has an added test that verifies permissions set on files at various steps. Running tests with the linux task-controller now.
        Hide
        Hemanth Yamijala added a comment -

        I missed out one more point. In initialize_job we should be setting the permissions as 2570 recursively. Only the work directory needs to open up write for the user.

        Show
        Hemanth Yamijala added a comment - I missed out one more point. In initialize_job we should be setting the permissions as 2570 recursively. Only the work directory needs to open up write for the user.
        Hide
        Hemanth Yamijala added a comment -

        Some comments on the task-controller code:

        • Code in run_task_as_user can reuse code from initialize_task
        • I think test-task-controller can be moved to a test folder following convention from other C++ modules.
        • The log file for the taskcontroller in open_log_file should be closed if we can't change permissions to it.
        • When there's an error in initializing job for any of the m/r local dirs, we should free the job_dir parameter, right ? Similarly in initializing task.
        • Task log dir for cleanup tasks will not have the name task-attempt-id.cleanup. This seems fine right now. Because we check if the directory exists and return if it doesn't. Can we comment this ?
        • Variable jars_dir not being used in run_task_as_user.
        • job_dir needs to be freed before execlp
        Show
        Hemanth Yamijala added a comment - Some comments on the task-controller code: Code in run_task_as_user can reuse code from initialize_task I think test-task-controller can be moved to a test folder following convention from other C++ modules. The log file for the taskcontroller in open_log_file should be closed if we can't change permissions to it. When there's an error in initializing job for any of the m/r local dirs, we should free the job_dir parameter, right ? Similarly in initializing task. Task log dir for cleanup tasks will not have the name task-attempt-id.cleanup. This seems fine right now. Because we check if the directory exists and return if it doesn't. Can we comment this ? Variable jars_dir not being used in run_task_as_user. job_dir needs to be freed before execlp
        Hide
        Hemanth Yamijala added a comment -

        One more point. I noticed that ReduceTask.ReduceCopier.configureClasspath() is using Task.getJobFile(). Will this be a problem given we are setting the value of the job file just before it is written to disk ?

        Show
        Hemanth Yamijala added a comment - One more point. I noticed that ReduceTask.ReduceCopier.configureClasspath() is using Task.getJobFile(). Will this be a problem given we are setting the value of the job file just before it is written to disk ?
        Hide
        Hemanth Yamijala added a comment -

        This is looking much simpler and good. Some comments, mostly minor:

        • When mapred.local.dir is changed in TaskRunner for sandboxing the child, it affects the JobConf instance in the TaskInProgress. This seems error prone. Can we check if this can be moved to the Child directly.
        • Add a comment saying why permissions are being set for the job and task directories at localization time itself, as opposed to inside the task-controller executable. This is mainly to narrow down the possibility of misuse, until we do a change of the group ownership.
        • In the same line, set permissions to tasklog attempt directory to restrictive permissions.
        • Add a comment to explain that split.dta is only used in the case of IsolationRunner where the changes are made in IsolationRunner.
        • We are still passing the log directory to the task-controller - doesn't seem needed.
        • I would suggest InitializeJobContext to be named as JobInitializationContext; doJobLocalization to localizeJobFiles and likewise for tasks.

        I have still to look at the task-controller changes, will take the latest patch for the same.

        Show
        Hemanth Yamijala added a comment - This is looking much simpler and good. Some comments, mostly minor: When mapred.local.dir is changed in TaskRunner for sandboxing the child, it affects the JobConf instance in the TaskInProgress. This seems error prone. Can we check if this can be moved to the Child directly. Add a comment saying why permissions are being set for the job and task directories at localization time itself, as opposed to inside the task-controller executable. This is mainly to narrow down the possibility of misuse, until we do a change of the group ownership. In the same line, set permissions to tasklog attempt directory to restrictive permissions. Add a comment to explain that split.dta is only used in the case of IsolationRunner where the changes are made in IsolationRunner. We are still passing the log directory to the task-controller - doesn't seem needed. I would suggest InitializeJobContext to be named as JobInitializationContext; doJobLocalization to localizeJobFiles and likewise for tasks. I have still to look at the task-controller changes, will take the latest patch for the same.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Updated patch including the review comments on the task-controller binary code.

        Show
        Vinod Kumar Vavilapalli added a comment - Updated patch including the review comments on the task-controller binary code.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Uploading a new patch. The last patch for the common project holds.

        • This patch implements the above proposal in spirit with some minor modifications.
        • Re factors localization code in TaskTracker and TaskRunner a bit so as to support writing test cases.

        The patch is functionally complete, but not the cleanest. Some of the minor review comments on the linux task-controller binary code are still in progress.

        Show
        Vinod Kumar Vavilapalli added a comment - Uploading a new patch. The last patch for the common project holds. This patch implements the above proposal in spirit with some minor modifications. Re factors localization code in TaskTracker and TaskRunner a bit so as to support writing test cases. The patch is functionally complete, but not the cleanest. Some of the minor review comments on the linux task-controller binary code are still in progress.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Going forward with the latest proposal.

        Salient points:

        • Same directory structure as what is in the trunk.
        • Actions taken up the linux-taskcontroller:
          • initializeJob
            • sudo chown user:mapred -R taskTracker/jobcache/$jobid
            • sudo chmod 570 -R taskTracker/jobcache/$jobid
          • initializeTask
            • sudo chown user:mapred -R taskTracker/jobcache/$jobid/$attemptid/
            • sudo chmod 770 -R taskTracker/jobcache/$jobid/$attemptid/
            • sudo chmod +s taskTracker/jobcache/$jobid/$attemptid
            • sudo chown user:mapred log-dir/userlogs/$attemptid
            • sudo chmod 770 log-dir/userlogs/$attemptid
            • sudo chmod +s log-dir/userlogs/$attemptid

        child umask 0007

        Show
        Vinod Kumar Vavilapalli added a comment - Going forward with the latest proposal. Salient points: Same directory structure as what is in the trunk. Actions taken up the linux-taskcontroller: initializeJob sudo chown user:mapred -R taskTracker/jobcache/$jobid sudo chmod 570 -R taskTracker/jobcache/$jobid initializeTask sudo chown user:mapred -R taskTracker/jobcache/$jobid/$attemptid/ sudo chmod 770 -R taskTracker/jobcache/$jobid/$attemptid/ sudo chmod +s taskTracker/jobcache/$jobid/$attemptid sudo chown user:mapred log-dir/userlogs/$attemptid sudo chmod 770 log-dir/userlogs/$attemptid sudo chmod +s log-dir/userlogs/$attemptid child umask 0007
        Hide
        Hemanth Yamijala added a comment -

        Owen and I had an offline discussion about this, and we felt another approach to try out was to see if we could have the directories and files owned by the user and group-owned by the tasktracker. The group ownership should be sticky so permissions are inherited. The permissions must apply for all the relevant components in the paths.

        So, $jobid and $attemptid in the examples above would be owned by the user, group-owned by mapred, and have permissions like 570 or similar.

        This might also remove the need to have parallel directory structures.

        The rationale for this approach follows from the fact that for maximum security the task-controller executable needs to be group owned by the tasktracker (to prevent other users from launching it). Hence, this almost means that the tasktracker user is a special user in the system that is required for secure installations. And it can be setup such that the user is in a separate group on its own.

        Thoughts ?

        Show
        Hemanth Yamijala added a comment - Owen and I had an offline discussion about this, and we felt another approach to try out was to see if we could have the directories and files owned by the user and group-owned by the tasktracker. The group ownership should be sticky so permissions are inherited. The permissions must apply for all the relevant components in the paths. So, $jobid and $attemptid in the examples above would be owned by the user, group-owned by mapred, and have permissions like 570 or similar. This might also remove the need to have parallel directory structures. The rationale for this approach follows from the fact that for maximum security the task-controller executable needs to be group owned by the tasktracker (to prevent other users from launching it). Hence, this almost means that the tasktracker user is a special user in the system that is required for secure installations. And it can be setup such that the user is in a separate group on its own. Thoughts ?
        Hide
        Hemanth Yamijala added a comment -

        Use acls on the directories to limit access to just the mapred and job user.

        Are you referring to the ACLs that use extended attributes on the file system. Their support should be explicitly enabled in the kernel, and also supported by the file system type, right ? If yes, doesn't this approach make the utility of this feature much more restrictive ?

        It seems like option 1 is a slightly better option in that case...

        Show
        Hemanth Yamijala added a comment - Use acls on the directories to limit access to just the mapred and job user. Are you referring to the ACLs that use extended attributes on the file system. Their support should be explicitly enabled in the kernel, and also supported by the file system type, right ? If yes, doesn't this approach make the utility of this feature much more restrictive ? It seems like option 1 is a slightly better option in that case...
        Hide
        Owen O'Malley added a comment -

        After a little playing around with Linux, it looks like it blocks using hard links for directories, even for root. frown

        That leaves us two options for dealing with the logs directory:
        1. Create the files and make hard links on the files and make sure the files are appended to.
        2. Use acls on the directories to limit access to just the mapred and job user.

        I'm leaning toward the second one.

        $ttroot - mapred, 755
           |
           |- jobs - mapred 755
           |     |
           |     '-- $jobid - mapred 700 + rw access by $user
           |            |
           |            |- distcache - mapred 755
           |            |- jars - mapred 755
           |            |    '-- job.jar
           |            `-- $attemptid - mapred 755
           |                  |- job.xml
           |                  |- taskjvm.sh
           |                  |- work - $user 700
           |                  |- logs - $user 755
           |                  `-- output - $user 755
           '-- system
                |
                `-- $jobid
                     |
                     |- job.xml
        

        So all of the protection is at the $jobid level. The user only has write access to the work, logs, and output directories.

        Thoughts?

        Show
        Owen O'Malley added a comment - After a little playing around with Linux, it looks like it blocks using hard links for directories, even for root. frown That leaves us two options for dealing with the logs directory: 1. Create the files and make hard links on the files and make sure the files are appended to. 2. Use acls on the directories to limit access to just the mapred and job user. I'm leaning toward the second one. $ttroot - mapred, 755 | |- jobs - mapred 755 | | | '-- $jobid - mapred 700 + rw access by $user | | | |- distcache - mapred 755 | |- jars - mapred 755 | | '-- job.jar | `-- $attemptid - mapred 755 | |- job.xml | |- taskjvm.sh | |- work - $user 700 | |- logs - $user 755 | `-- output - $user 755 '-- system | `-- $jobid | |- job.xml So all of the protection is at the $jobid level. The user only has write access to the work, logs, and output directories. Thoughts?
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Broadly, there are two directory strucutres - system and users

        • system directory will be owned by mapreduce, thereby protecting the contents.
        • users is 755, owned by mapreduce
        • users/$jobid is clearly 700 and owned by the user.
        • system/$jobid/outputs can be directly $ttroot/ as was discussed offline. But I've left it inside system/$jobid as the $jobid directory seemed reduntant to me. In any case, the outputs once moved need to owned by the TT.
        • all of the files localized by the TT are written into system/$jobid
        • After job localization is done, all files under system/$jobid/userfiles are moved to users/$jobid to be consumed by the user's task and so owned by the user.
        • After task localization is done, the whole directory system/$jobid/$taskid is moved to users/$jobid/ and owned by the user.
        • when the task finishes, the whole users/$user/$jobid/$attemptid/output directory needs to be moved to outputs/$jobid/$attemptid.
        • cleaning up of a task is removal of users//$jobid/$attemptid
        • cleaning up a job is removal of users//$jobid, system/$jobid

        These changs will be needed for both DefaultTaskController as well as the LinuxTaskController.

        LinuxTaskController uses the setuid binary to do the move operations as the root and changing ownership of the target files to the user.

        Distributed cache files and the log files still need to be baked into this structure.

        Show
        Vinod Kumar Vavilapalli added a comment - Broadly, there are two directory strucutres - system and users system directory will be owned by mapreduce, thereby protecting the contents. users is 755, owned by mapreduce users/$jobid is clearly 700 and owned by the user. system/$jobid/outputs can be directly $ttroot/ as was discussed offline. But I've left it inside system/$jobid as the $jobid directory seemed reduntant to me. In any case, the outputs once moved need to owned by the TT. all of the files localized by the TT are written into system/$jobid After job localization is done, all files under system/$jobid/userfiles are moved to users/$jobid to be consumed by the user's task and so owned by the user. After task localization is done, the whole directory system/$jobid/$taskid is moved to users/$jobid/ and owned by the user. when the task finishes, the whole users/$user/$jobid/$attemptid/output directory needs to be moved to outputs/$jobid/$attemptid. cleaning up of a task is removal of users//$jobid/$attemptid cleaning up a job is removal of users//$jobid, system/$jobid These changs will be needed for both DefaultTaskController as well as the LinuxTaskController. LinuxTaskController uses the setuid binary to do the move operations as the root and changing ownership of the target files to the user. Distributed cache files and the log files still need to be baked into this structure.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        I had some offline discussions with Hemanth/Owen/Arun. Those discussions resulted in a change of proposal for this JIRA.

        The TaskTracker's directory structure will be like the following:

        $ttroot
           |
           |- distcache
           |
           |- users
           |     |
           |     '-- $jobid
           |            |
           |            |- jars
           |            |    '-- job.jar
           |            |- work
           |            |
           |            `-- $attemptid
           |                  |- job.xml
           |                  |- taskjvm.sh
           |                  |- work
           |                  |      '-- tmp
           |                  `-- output
           |
           '-- system
                |
                `-- $jobid
                     |
                     |- job.xml
                     |
                     |- userfiles
                     |       |- jars
                     |       |    '-- job.jar
                     |       '-- work
                     |
                     |- $attemptid
                     |        |- job.xml
                     |        |- taskjvm.sh
                     |        '-- work
                     |              '-- tmp
                     |
                     '-- outputs
                             `-- $attemptid 
        
        Show
        Vinod Kumar Vavilapalli added a comment - I had some offline discussions with Hemanth/Owen/Arun. Those discussions resulted in a change of proposal for this JIRA. The TaskTracker's directory structure will be like the following: $ttroot | |- distcache | |- users | | | '-- $jobid | | | |- jars | | '-- job.jar | |- work | | | `-- $attemptid | |- job.xml | |- taskjvm.sh | |- work | | '-- tmp | `-- output | '-- system | `-- $jobid | |- job.xml | |- userfiles | |- jars | | '-- job.jar | '-- work | |- $attemptid | |- job.xml | |- taskjvm.sh | '-- work | '-- tmp | '-- outputs `-- $attemptid
        Hide
        Hemanth Yamijala added a comment -

        I looked at the C code that was pending from previous review. Looking good overall. I have a few comments - mostly minor:

        • Should the pointer returned by strdup in configuration.get_value() be freed ? It doesn't seem to be in some cases.
        • Return value of any code path calling concatenate should be checked, as it can be null.
        • I think we can use stat instead of opendir to check for existence of path.
        • Check return value of fts calls.
        • I think it may be good to use the flag FTS_NOSTAT as we don't use stat info and it looks like it is an optimization to do so.
        • It may be safe to handle a default case in secure_path, just in case versions of FTS have types we are not handling. For e.g we are not handling FTS_ERR.
        • The value of the process_flag seems against the name. It should be true (non zero) if we want to process and false (zero) otherwise. Then code like if (!process_path) continue; can be used. Similarly for the variables 'failed' and 'first_task'
        • We can use strerror that will print a standard message per errno itself, for calls like chown etc. rather than printing the error message ourselves based on errno.
        • What are the permissions for the tasklog attempt directories ? Since they're not being set, will they take the value of the default umask ?
        • check_owner is removed. Need to make sure this is ok.
        • Let's add debug prints into #ifdef DEBUG or something similar.
        • Looks like run_task_as_user shares code with initialize_task. Can it call initialize_task directly ?
        • In TERMINATE_TASK_JVM, we are not using first_task - remove the argument to task-controller. Also for clarity, I think we can call the argument as cleanup_work_dirs or something like that, rather than first_task.
        • There seems to be a spurious call to open_log_file in TERMINATE_TASK_JVM.
        • display_usage seems out of place in task-controller.c. Can we move it to main ?
        Show
        Hemanth Yamijala added a comment - I looked at the C code that was pending from previous review. Looking good overall. I have a few comments - mostly minor: Should the pointer returned by strdup in configuration.get_value() be freed ? It doesn't seem to be in some cases. Return value of any code path calling concatenate should be checked, as it can be null. I think we can use stat instead of opendir to check for existence of path. Check return value of fts calls. I think it may be good to use the flag FTS_NOSTAT as we don't use stat info and it looks like it is an optimization to do so. It may be safe to handle a default case in secure_path, just in case versions of FTS have types we are not handling. For e.g we are not handling FTS_ERR. The value of the process_flag seems against the name. It should be true (non zero) if we want to process and false (zero) otherwise. Then code like if (!process_path) continue; can be used. Similarly for the variables 'failed' and 'first_task' We can use strerror that will print a standard message per errno itself, for calls like chown etc. rather than printing the error message ourselves based on errno. What are the permissions for the tasklog attempt directories ? Since they're not being set, will they take the value of the default umask ? check_owner is removed. Need to make sure this is ok. Let's add debug prints into #ifdef DEBUG or something similar. Looks like run_task_as_user shares code with initialize_task. Can it call initialize_task directly ? In TERMINATE_TASK_JVM, we are not using first_task - remove the argument to task-controller. Also for clarity, I think we can call the argument as cleanup_work_dirs or something like that, rather than first_task. There seems to be a spurious call to open_log_file in TERMINATE_TASK_JVM. display_usage seems out of place in task-controller.c. Can we move it to main ?
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Uploading mapred part of the patch that incorporates most of the review comments.

        • In the case of jvm reuse, we still need to make the output available. Because it seems the task completion event is sent as soon as the task is done not after jvm exits. This is only a theoretical case possibly, but it still will be good to keep the code paths identical.
        • finalizeTaskDirs (or parts of it) needs to be synchronized.
        • Path permissions should be taken care of ? so, if mapred.local.dir doesn't have execute permissions for others, we need to set them.
        • In setup, we seem to be setting permissions even if directory creation fails. also shouldn't we set permissions even if it exists.. so that it is right as per our requirement.
        • Cache directory probably doesn't need 777 because it is not written to by the tasks. We can probably retain this if HADOOP-4490 set like permissions, since this will be addressed in HADOOP-4493.
        • getBaseIntermediateOutputDir seems an overkill if it is just returning a constant.
        • Changes in SpillRecord seem to be unnecessary.
        • TaskController.FILE_PERMISSIONS doesn't seem to be used anywhere.
        • Rename finalizeTaskDirs as finalizeTask - it matches with the naming convention for the other apis in taskcontroller.
        • Add a comment about why task-work is 755.
        • mapred.child.local.dir has an extra comma at the end.

        – Done

        • Permissions for job directory is changed multiple times - once per each task.
          • Done. Added a new INITIALIZE_JOB command.
        • Code related to creation of work dirs is removed from TaskRunner. Where is it created now ?
          • This is actually not needed. The work dirs are already created before at TaskTracker.localizeTask() (creation of cwd)
        • Didn't understand the purpose of initStatus. Since it starts out being true, wouldn't it always remain true ?
          • initStatus is used to track if directory creation is failing on all the disks. It was incorrectly initialized to true. Fixed this.

        Things to be done:

        • The code models some of the commands passed to the taskcontroller as 'first task' for JVM or 'not first task' for JVM. This complicates the contract between the TT and the task-controller. I think at a minimum these decisions should be restricted only to the JVM manager, and the contract between the TT and task-controller should be handled by modelling new commands like FINALIZE_JVM and FINALIZE_TASK.
        • The patch introduces a log directory argument to the task-controller. I think this is not required as there is only one value for the hadoop.log.dir for all tasks.
        • TODOs in the patch must be discussed and resolved.
        • Rename finalizeTaskDirs as finalizeTask. To be done in the task-controller code

        1. Setting secure permissions by default vs doing so only in the LinuxTaskController
        2. Approach for sharing common files between TT and child
        3. Sandboxing task runtime environment by changing values of mapred.local.dir for the child

        Show
        Vinod Kumar Vavilapalli added a comment - Uploading mapred part of the patch that incorporates most of the review comments. In the case of jvm reuse, we still need to make the output available. Because it seems the task completion event is sent as soon as the task is done not after jvm exits. This is only a theoretical case possibly, but it still will be good to keep the code paths identical. finalizeTaskDirs (or parts of it) needs to be synchronized. Path permissions should be taken care of ? so, if mapred.local.dir doesn't have execute permissions for others, we need to set them. In setup, we seem to be setting permissions even if directory creation fails. also shouldn't we set permissions even if it exists.. so that it is right as per our requirement. Cache directory probably doesn't need 777 because it is not written to by the tasks. We can probably retain this if HADOOP-4490 set like permissions, since this will be addressed in HADOOP-4493 . getBaseIntermediateOutputDir seems an overkill if it is just returning a constant. Changes in SpillRecord seem to be unnecessary. TaskController.FILE_PERMISSIONS doesn't seem to be used anywhere. Rename finalizeTaskDirs as finalizeTask - it matches with the naming convention for the other apis in taskcontroller. Add a comment about why task-work is 755. mapred.child.local.dir has an extra comma at the end. – Done Permissions for job directory is changed multiple times - once per each task. Done. Added a new INITIALIZE_JOB command. Code related to creation of work dirs is removed from TaskRunner. Where is it created now ? This is actually not needed. The work dirs are already created before at TaskTracker.localizeTask() (creation of cwd) Didn't understand the purpose of initStatus. Since it starts out being true, wouldn't it always remain true ? initStatus is used to track if directory creation is failing on all the disks. It was incorrectly initialized to true. Fixed this. Things to be done: The code models some of the commands passed to the taskcontroller as 'first task' for JVM or 'not first task' for JVM. This complicates the contract between the TT and the task-controller. I think at a minimum these decisions should be restricted only to the JVM manager, and the contract between the TT and task-controller should be handled by modelling new commands like FINALIZE_JVM and FINALIZE_TASK. The patch introduces a log directory argument to the task-controller. I think this is not required as there is only one value for the hadoop.log.dir for all tasks. TODOs in the patch must be discussed and resolved. Rename finalizeTaskDirs as finalizeTask. To be done in the task-controller code 1. Setting secure permissions by default vs doing so only in the LinuxTaskController 2. Approach for sharing common files between TT and child 3. Sandboxing task runtime environment by changing values of mapred.local.dir for the child
        Hide
        Hemanth Yamijala added a comment -

        Some other comments on the code:

        • The code models some of the commands passed to the taskcontroller as 'first task' for JVM or 'not first task' for JVM. This complicates the contract between the TT and the task-controller. I think at a minimum these decisions should be restricted only to the JVM manager, and the contract between the TT and task-controller should be handled by modelling new commands like FINALIZE_JVM and FINALIZE_TASK.
        • The patch introduces a log directory argument to the task-controller. I think this is not required as there is only one value for the hadoop.log.dir for all tasks.
        • Code related to creation of work dirs is removed from TaskRunner. Where is it created now ?
        • Permissions for job directory is changed multiple times - once per each task.
        • finalizeTaskDirs (or parts of it) needs to be synchronized.
        • In the case of jvm reuse, we still need to make the output available. Because it seems the task completion event is sent as soon as the task is done not after jvm exits. This is only a theoretical case possibly, but it still will be good to keep the code paths identical.
        • Path permissions should be taken care of ? so, if mapred.local.dir doesn't have execute permissions for others, we need to set them.
        • In setup, we seem to be setting permissions even if directory creation fails. also shouldn't we set permissions even if it exists.. so that it is right as per our requirement.
        • Didn't understand the purpose of initStatus. Since it starts out being true, wouldn't it always remain true ?
        • Cache directory probably doesn't need 777 because it is not written to by the tasks. We can probably retain this if HADOOP-4490 set like permissions, since this will be addressed in HADOOP-4493.
        • getBaseIntermediateOutputDir seems an overkill if it is just returning a constant.
        • Changes in SpillRecord seem to be unnecessary.

        Some nits:

        • There are some else blocks without code, but with a code comment explaining why there's no else block. While useful, it is somewhat unconventional. I would recommend the reason be moved to a comment starting the if block itself.
        • TODOs in the patch must be discussed and resolved.
        • TaskController.FILE_PERMISSIONS doesn't seem to be used anywhere.
        • Rename finalizeTaskDirs as finalizeTask - it matches with the naming convention for the other apis in taskcontroller.
        • Add a comment about why task-work is 755.
        • mapred.child.local.dir has an extra comma at the end.
        Show
        Hemanth Yamijala added a comment - Some other comments on the code: The code models some of the commands passed to the taskcontroller as 'first task' for JVM or 'not first task' for JVM. This complicates the contract between the TT and the task-controller. I think at a minimum these decisions should be restricted only to the JVM manager, and the contract between the TT and task-controller should be handled by modelling new commands like FINALIZE_JVM and FINALIZE_TASK. The patch introduces a log directory argument to the task-controller. I think this is not required as there is only one value for the hadoop.log.dir for all tasks. Code related to creation of work dirs is removed from TaskRunner. Where is it created now ? Permissions for job directory is changed multiple times - once per each task. finalizeTaskDirs (or parts of it) needs to be synchronized. In the case of jvm reuse, we still need to make the output available. Because it seems the task completion event is sent as soon as the task is done not after jvm exits. This is only a theoretical case possibly, but it still will be good to keep the code paths identical. Path permissions should be taken care of ? so, if mapred.local.dir doesn't have execute permissions for others, we need to set them. In setup, we seem to be setting permissions even if directory creation fails. also shouldn't we set permissions even if it exists.. so that it is right as per our requirement. Didn't understand the purpose of initStatus. Since it starts out being true, wouldn't it always remain true ? Cache directory probably doesn't need 777 because it is not written to by the tasks. We can probably retain this if HADOOP-4490 set like permissions, since this will be addressed in HADOOP-4493 . getBaseIntermediateOutputDir seems an overkill if it is just returning a constant. Changes in SpillRecord seem to be unnecessary. Some nits: There are some else blocks without code, but with a code comment explaining why there's no else block. While useful, it is somewhat unconventional. I would recommend the reason be moved to a comment starting the if block itself. TODOs in the patch must be discussed and resolved. TaskController.FILE_PERMISSIONS doesn't seem to be used anywhere. Rename finalizeTaskDirs as finalizeTask - it matches with the naming convention for the other apis in taskcontroller. Add a comment about why task-work is 755. mapred.child.local.dir has an extra comma at the end.
        Hide
        Hemanth Yamijala added a comment -

        I would like to call out some approaches this patch is taking explicitly, so that there's an opportunity for discussion:

        1. Setting secure permissions by default vs doing so only in the LinuxTaskController

        The patch sets permissions for localized job files and directories as part of the localization process itself, that is as soon as the files are localized, rather than in the TaskController. I believe this approach was taken primarily to secure permissions on the localized files on the disk ASAP. The effect of doing so is that now all configurations of hadoop, including those which do not worry about security will have secure permissions for some of the files. While this should be OK in principle, it must be noted that without the entire solution offered by the LinuxTaskController, full security is not achieved. Hence, does it make sense to make changes to the permissions only when using the LinuxTaskController (with a small window where files could have potentially insecure permissions), and leave the default case alone ?

        Either way, I think the current method in which permissions are changed for the localized files causes these to be splattered across the code. It may lead to mistakes in future if somebody is making changes (say to localize a new file) and is not fully aware of the permissions issue. Ideally it should be abstracted from the process. In speaking with Vinod about this, I realized this may not be very easy to achieve the way LocalDirAllocator currently works. If we keep the changes to the task controller alone, I believe this will be very easy to accomplish.

        2. Approach for sharing common files between TT and child

        The patch juggles around with permissions on the localized and intermediate files by first setting ownership to the job owner, and after the task is complete, changing the ownership back to the tasktracker user. The reason it does this, is for two purposes:
        – To enable intermediate outputs to be served by the tasktracker, once the child has created them.
        – To allow cleanup as the tasktracker once the tasks are done.

        Note that change of ownership is done by launching the task-controller setuid binary as root privileges are required for the purpose.

        The above two problems can be solved in another way as well (and this is what was discussed on HADOOP-4490). In that approach, we'd thought of moving the intermediate outputs to a different directory owned by the tasktracker, and we'd thought of removing files and directories as the user, as part of the cleanup thread. I think one advantage this alternative approach brings is that it gives an opportunity to not launch the task-controller for reduces and hence slots can become free sooner. This does mean the cleanup might take more time, but that happens asynchronously.

        On the other hand, the current model seems to be simpler to fit into mind. If a task is not complete, its files are owned by the job owner, if not, they are owned by TT. It is easy to check these rules on a running system. So, is it worth the change in approach to bring in a little optimization (which might actually not matter that much).

        3. Sandboxing task runtime environment by changing values of mapred.local.dir for the child

        The patch 'sandboxes' the task runtime environment, by changing the mapred.local.dir values from the original configured values to $

        {original mapred.local.dir}

        /taskTracker/jobCache/$

        {job-id}

        /$

        {task-id}

        and pass the new values to the child. Vinod told me that this was primarily required because checks in the LocalDirAllocator require the current process to have write permissions on the context passed to it (which is basically the mapred local directories). When the child calls LocalDirAllocator, it would now fail because the original mapred local directories will not have write permissions to world. Hence, the need to sandbox. Of course, it is also probably more secure.

        Two questions this raises: Is this change in contract acceptable ? If yes, is it acceptable in the default case as well, or should it be restricted to the case of LinuxTaskController alone.

        Either way, the current patch, in order to implement this behavior juggles around the values of the mapred.local.dir variable in conf file at 2-3 locations in the tasktracker code. I think that approach is error prone and needs to change.

        Thoughts on these three points ?

        Show
        Hemanth Yamijala added a comment - I would like to call out some approaches this patch is taking explicitly, so that there's an opportunity for discussion: 1. Setting secure permissions by default vs doing so only in the LinuxTaskController The patch sets permissions for localized job files and directories as part of the localization process itself, that is as soon as the files are localized, rather than in the TaskController. I believe this approach was taken primarily to secure permissions on the localized files on the disk ASAP. The effect of doing so is that now all configurations of hadoop, including those which do not worry about security will have secure permissions for some of the files. While this should be OK in principle, it must be noted that without the entire solution offered by the LinuxTaskController, full security is not achieved. Hence, does it make sense to make changes to the permissions only when using the LinuxTaskController (with a small window where files could have potentially insecure permissions), and leave the default case alone ? Either way, I think the current method in which permissions are changed for the localized files causes these to be splattered across the code. It may lead to mistakes in future if somebody is making changes (say to localize a new file) and is not fully aware of the permissions issue. Ideally it should be abstracted from the process. In speaking with Vinod about this, I realized this may not be very easy to achieve the way LocalDirAllocator currently works. If we keep the changes to the task controller alone, I believe this will be very easy to accomplish. 2. Approach for sharing common files between TT and child The patch juggles around with permissions on the localized and intermediate files by first setting ownership to the job owner, and after the task is complete, changing the ownership back to the tasktracker user. The reason it does this, is for two purposes: – To enable intermediate outputs to be served by the tasktracker, once the child has created them. – To allow cleanup as the tasktracker once the tasks are done. Note that change of ownership is done by launching the task-controller setuid binary as root privileges are required for the purpose. The above two problems can be solved in another way as well (and this is what was discussed on HADOOP-4490 ). In that approach, we'd thought of moving the intermediate outputs to a different directory owned by the tasktracker, and we'd thought of removing files and directories as the user, as part of the cleanup thread. I think one advantage this alternative approach brings is that it gives an opportunity to not launch the task-controller for reduces and hence slots can become free sooner. This does mean the cleanup might take more time, but that happens asynchronously. On the other hand, the current model seems to be simpler to fit into mind. If a task is not complete, its files are owned by the job owner, if not, they are owned by TT. It is easy to check these rules on a running system. So, is it worth the change in approach to bring in a little optimization (which might actually not matter that much). 3. Sandboxing task runtime environment by changing values of mapred.local.dir for the child The patch 'sandboxes' the task runtime environment, by changing the mapred.local.dir values from the original configured values to $ {original mapred.local.dir} /taskTracker/jobCache/$ {job-id} /$ {task-id} and pass the new values to the child. Vinod told me that this was primarily required because checks in the LocalDirAllocator require the current process to have write permissions on the context passed to it (which is basically the mapred local directories). When the child calls LocalDirAllocator, it would now fail because the original mapred local directories will not have write permissions to world. Hence, the need to sandbox. Of course, it is also probably more secure. Two questions this raises: Is this change in contract acceptable ? If yes, is it acceptable in the default case as well, or should it be restricted to the case of LinuxTaskController alone. Either way, the current patch, in order to implement this behavior juggles around the values of the mapred.local.dir variable in conf file at 2-3 locations in the tasktracker code. I think that approach is error prone and needs to change. Thoughts on these three points ?
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Adding new patches:

        • Cleaned up the common part of the code a little. After going through the patch, realized that the changes to LocalDirAllocator can be done away with and so reverted most of the changes. Now we only need a way to set permissions and so retained setPermissions and related api but moved them to FileUtil.
        • Found a bug related to task localization. With the attached patch, we create attempt directories also on all the disks. This has to be done because job directory is writable only by the TT and so the child cannot create non-existing attempt dirs on certain disks on demand.

        Freezing up any further changes till review comments are put up.

        Show
        Vinod Kumar Vavilapalli added a comment - Adding new patches: Cleaned up the common part of the code a little. After going through the patch, realized that the changes to LocalDirAllocator can be done away with and so reverted most of the changes. Now we only need a way to set permissions and so retained setPermissions and related api but moved them to FileUtil. Found a bug related to task localization. With the attached patch, we create attempt directories also on all the disks. This has to be done because job directory is writable only by the TT and so the child cannot create non-existing attempt dirs on certain disks on demand. Freezing up any further changes till review comments are put up.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Uploading patches that include the rest of the comments, except

        # It will be nice to combine the APIs for creating files/directories and setting appropriate permissions all in one API

        There are very few occurrences of above now in the code. So dropping this.

        The patches can be reviewed, while I try any possible code reuse/refactoring and making test-cases comprehensive.

        Show
        Vinod Kumar Vavilapalli added a comment - Uploading patches that include the rest of the comments, except # It will be nice to combine the APIs for creating files/directories and setting appropriate permissions all in one API There are very few occurrences of above now in the code. So dropping this. The patches can be reviewed, while I try any possible code reuse/refactoring and making test-cases comprehensive.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Adding patches for quick review. Still a work in progress. Incorporated most of Devaraj's comments.

        Two major changes from the previous patch:

        • Pulled taskTracker/jobcache/attemp-id/work to taskTracker/jobcache/task-work/attempt-id/work. This is done to solve issues with jvm reuse. The work dir needs to be shared across tasks with jvm reuse and so cannot be finalized to be owned back by the TT on task finish. Pulling this out makes things clean, the original attempt-dir is finalized, and the task-work directory is cleaned up on jvm exit. TODO: task-work can have better name, perhaps jvmcache.
        • log-dir/userlogs/attempt-id will still have 777 permissions with all the files inside created by linuxTaskController but owned by the user. This is done because
          • attempt-id dir has to be writable by the child for log.tmp which is done periodically in synclogs.
          • attempt-dir has to be readable by TT for log serving and writable by TT also by cleanup.

        Long time, the correct solution is for this directory to be owned by the child, but cleaned up by TT using a LinuxTaskController binary lauch to swith user.

        Pending items from Devaraj's comments:

        • It will be nice to combine the APIs for creating files/directories and setting appropriate permissions all in one API
        • DiskChecker.java has lot of code to do with permissions handling to make it generic but not everything would be actually used. In fact, the approach taken in making some APIs generic is debatable. We might as well keep it simple for now and extend those APIs as and when required.
        • LocalDirAllocator.getSecureLocalPathForWrite could be renamed as getPrivateLocalPathForEWrite

        Extra:

        • Test TaskTracker.initializeJobDirs
        • Extra tests in taskcontroller.c
        • Test various scenarios of finalizeTaskDirs
        • Finish tests for jvm reuse

        There are some minor TODO's also left to do.

        Show
        Vinod Kumar Vavilapalli added a comment - Adding patches for quick review. Still a work in progress. Incorporated most of Devaraj's comments. Two major changes from the previous patch: Pulled taskTracker/jobcache/attemp-id/work to taskTracker/jobcache/task-work/attempt-id/work . This is done to solve issues with jvm reuse. The work dir needs to be shared across tasks with jvm reuse and so cannot be finalized to be owned back by the TT on task finish. Pulling this out makes things clean, the original attempt-dir is finalized, and the task-work directory is cleaned up on jvm exit. TODO: task-work can have better name, perhaps jvmcache. log-dir/userlogs/attempt-id will still have 777 permissions with all the files inside created by linuxTaskController but owned by the user. This is done because attempt-id dir has to be writable by the child for log.tmp which is done periodically in synclogs. attempt-dir has to be readable by TT for log serving and writable by TT also by cleanup. Long time, the correct solution is for this directory to be owned by the child, but cleaned up by TT using a LinuxTaskController binary lauch to swith user. Pending items from Devaraj's comments: It will be nice to combine the APIs for creating files/directories and setting appropriate permissions all in one API DiskChecker.java has lot of code to do with permissions handling to make it generic but not everything would be actually used. In fact, the approach taken in making some APIs generic is debatable. We might as well keep it simple for now and extend those APIs as and when required. LocalDirAllocator.getSecureLocalPathForWrite could be renamed as getPrivateLocalPathForEWrite Extra: Test TaskTracker.initializeJobDirs Extra tests in taskcontroller.c Test various scenarios of finalizeTaskDirs Finish tests for jvm reuse There are some minor TODO's also left to do.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Documenting more things that need to be done. These came out in a short discussion with Hemanth/Sreekanth.

        • Userlogs should be cleanable by the TT which isn't possible with the last patch as attempt-dirs in userlogs is owned by the user.
        • Jvm-reuse doesn't work with the current patch.
        Show
        Vinod Kumar Vavilapalli added a comment - Documenting more things that need to be done. These came out in a short discussion with Hemanth/Sreekanth. Userlogs should be cleanable by the TT which isn't possible with the last patch as attempt-dirs in userlogs is owned by the user. Jvm-reuse doesn't work with the current patch.
        Hide
        Devaraj Das added a comment -

        On the mapreduce part of the patch,
        1) TaskTracker.getIntermediateOutputDirForChild can be renamed to getBaseIntermediateOutputDir
        2) TaskTracker.initializeJobDirs should delete stuff on all the disks, but should return successfully if it is able to create the path on at least one disk
        3) job-id/work should be user owned?
        4) Check whether the call to mkdirs(workDir) in localizeJob is required?
        5) There is a spurious LOG.info(" From fComf " + fConf.get("mapred.local.dir")); in localizeTask. Remove it.
        6) Files created within private directories don't have to have 700 permissions. Remove all such permissions settings (e.g., this applies to localTaskFile in localizeTask)
        7) The work-dir in the attempt directory should be readable/writable by all tasks that the JVM runs (in the jvm reuse case). Currently, jvmManager.finalizeTaskDirs will prevent this from happening.
        8) It'd be good to remove redundant calls to finalizeTaskDirs in JvmManager.
        9) It will be nice to combine the APIs for creating files/directories and setting appropriate permissions all in one API
        10) Check if the configuration setting of mapred.local.dir for the child can be done in the TaskTracker before launching the task (instead of the task itself doing that)
        11) There is a spurious LOG.info in Child. Remove that or reword that.

        I haven't looked at the TaskController/LinuxTaskController changes..

        Show
        Devaraj Das added a comment - On the mapreduce part of the patch, 1) TaskTracker.getIntermediateOutputDirForChild can be renamed to getBaseIntermediateOutputDir 2) TaskTracker.initializeJobDirs should delete stuff on all the disks, but should return successfully if it is able to create the path on at least one disk 3) job-id/work should be user owned? 4) Check whether the call to mkdirs(workDir) in localizeJob is required? 5) There is a spurious LOG.info(" From fComf " + fConf.get("mapred.local.dir")); in localizeTask. Remove it. 6) Files created within private directories don't have to have 700 permissions. Remove all such permissions settings (e.g., this applies to localTaskFile in localizeTask) 7) The work-dir in the attempt directory should be readable/writable by all tasks that the JVM runs (in the jvm reuse case). Currently, jvmManager.finalizeTaskDirs will prevent this from happening. 8) It'd be good to remove redundant calls to finalizeTaskDirs in JvmManager. 9) It will be nice to combine the APIs for creating files/directories and setting appropriate permissions all in one API 10) Check if the configuration setting of mapred.local.dir for the child can be done in the TaskTracker before launching the task (instead of the task itself doing that) 11) There is a spurious LOG.info in Child. Remove that or reword that. I haven't looked at the TaskController/LinuxTaskController changes..
        Hide
        Devaraj Das added a comment -

        On the patch that affects common, some comments:
        1) RunJar.unjar should set the permissions for all the entries in the path after expansion
        2) DiskChecker.java has lot of code to do with permissions handling to make it generic but not everything would be actually used. In fact, the approach taken in making some APIs generic is debatable. We might as well keep it simple for now and extend those APIs as and when required.
        3) LocalDirAllocator.getSecureLocalPathForWrite could be renamed as getPrivateLocalPathForEWrite

        Show
        Devaraj Das added a comment - On the patch that affects common, some comments: 1) RunJar.unjar should set the permissions for all the entries in the path after expansion 2) DiskChecker.java has lot of code to do with permissions handling to make it generic but not everything would be actually used. In fact, the approach taken in making some APIs generic is debatable. We might as well keep it simple for now and extend those APIs as and when required. 3) LocalDirAllocator.getSecureLocalPathForWrite could be renamed as getPrivateLocalPathForEWrite
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Few points:

        • After discussing with Hemanth/Sreekanth and looking up the original intentions of HADOOP-4490, I've reworked the above proposal to have very tight(0700) permissions everywhere.
        • But to allow multiple users to create paths, the following paths still have 755 permissions: taskTracker, jobcache, job directory, root-log-dir and distribute cache directory.
        • We need more work to secure the logs. In these patches, log directories of attempts inside userlogs are owned by the user, but still have 755 permissions fo the TT to read them back to serve via TaskLogServlet.

        Here's what the patches do:

        common-patch:

        • Changes LocalDirAllocator and DiskChecker to changes permissions of the paths created using java api to set permission instead of spawning chmod processes.

        mapreduce-patch:

        • Changes TaskTracker related classes to create secure paths with 700 permissions by calling LocalDirAllocator.getSecurePathForWrite() instead of the usual LocalDirAllocator.getLocalPathForWrite()
        • Splits TaskTracker.getIntermediateOutputDir() into TaskTracker.getIntermediateOutputDirForChild and TaskTracker.getIntermediateOutputDirForTT() as now, child's mapred.local.dir is modified so as to sandbox it for preserving secure permissions.
        • Makes changes to LinuxTaskController binary to do the following:
          • Change ownership of attempt-dir, jars-dir, log-dir to be owned by the child at jvm startup.
          • Finalize task-directories to be owned back by the TT when a task finishes for output serving and cleaning up.
          • Finalizing job-dirs to be owned by the TT when the job finishes.
        • Adds a new c based testing for the LinuxTaskController and makes relevant changes to the build.xml, MakeFile, configure.ac. This test can be run by the target test-task-controller.
        • Modifies tests inheriting from ClusterWithLinuxTaskController to reflect the above changes.
        • Incorporates changes needed for MAPREDUCE-131.

        I've tested the patch by running the LinuxTaskController specific tests and observing the directory structure over time. All the tests with successful, failed, killed jobs, with and without reuse pass in the sense that throughout a job's timeline, any path is either owned by the TT or the child and has the most secure permissions possible.

        Show
        Vinod Kumar Vavilapalli added a comment - Few points: After discussing with Hemanth/Sreekanth and looking up the original intentions of HADOOP-4490 , I've reworked the above proposal to have very tight(0700) permissions everywhere. But to allow multiple users to create paths, the following paths still have 755 permissions: taskTracker, jobcache, job directory, root-log-dir and distribute cache directory. We need more work to secure the logs. In these patches, log directories of attempts inside userlogs are owned by the user, but still have 755 permissions fo the TT to read them back to serve via TaskLogServlet. Here's what the patches do: common-patch: Changes LocalDirAllocator and DiskChecker to changes permissions of the paths created using java api to set permission instead of spawning chmod processes. mapreduce-patch: Changes TaskTracker related classes to create secure paths with 700 permissions by calling LocalDirAllocator.getSecurePathForWrite() instead of the usual LocalDirAllocator.getLocalPathForWrite() Splits TaskTracker.getIntermediateOutputDir() into TaskTracker.getIntermediateOutputDirForChild and TaskTracker.getIntermediateOutputDirForTT() as now, child's mapred.local.dir is modified so as to sandbox it for preserving secure permissions. Makes changes to LinuxTaskController binary to do the following: Change ownership of attempt-dir, jars-dir, log-dir to be owned by the child at jvm startup. Finalize task-directories to be owned back by the TT when a task finishes for output serving and cleaning up. Finalizing job-dirs to be owned by the TT when the job finishes. Adds a new c based testing for the LinuxTaskController and makes relevant changes to the build.xml, MakeFile, configure.ac. This test can be run by the target test-task-controller. Modifies tests inheriting from ClusterWithLinuxTaskController to reflect the above changes. Incorporates changes needed for MAPREDUCE-131 . I've tested the patch by running the LinuxTaskController specific tests and observing the directory structure over time. All the tests with successful, failed, killed jobs, with and without reuse pass in the sense that throughout a job's timeline, any path is either owned by the TT or the child and has the most secure permissions possible.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Attaching two files - one for common project and one for mapreduce - for review.

        Show
        Vinod Kumar Vavilapalli added a comment - Attaching two files - one for common project and one for mapreduce - for review.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Some (broad) proposals for solving this issue:

        Localization

        (A) Move the whole localization out of the taskTracker o be done as the user.

        • Adv: Because everything is done by the user, there is no hassle of changing permission now and then in TT. We just need to support reading of data back by the TT for serving.
        • Disadv: (As Devaraj pointed out in a quick chat) Synchronizing localization across the different process becomes quite complicated

        (B) Separate tt-only, child-only space from shared space. TT-only and child-only spaces are exclusively for the TT and the child respectively. TT does localization in tt-only area, task-controller binary then moves directory structure to the child only area. The shared space is for the stuff generated by the child for TT and has restricted access (511 on dirs and 444 on files) for TT and others. Even though other users can read this area, they won't be able to delete/write stuff.

        • Adv: Keeps things very simple
        • DisAdv: Sacrifices some of the stiff 700 acess restrictions in favour of a more manageable 511/444 permissions.

        (C) Instead of separating the directory structures completely, use the same for both TT and the user wherever necessary.

        • Adv : Avoids replication of the directory structure
        • DisAdv: Paths closer to the mapred-local-dir are owned by TT and further down the paths are owned by the child. Currently, task use same mapred.local.dir as task-tracker. When tasks need a path for writing their output, the LocalDirAllocator checks write permission on root directory owned by tt only and would fail We will have to handle this by modifying the mapred-local-dir of the child.

        Intermediate output

        • If we chose (A) or (C) for localization, we need to run the task-controller again to make the output accessible to the TT
        • If we chose (B) for localization, intermediate output is automatically available to the TT.

        Task logs

        • If we chose (A) or (C), whenever there is a request for the logs, we need to run the task-controller to run to stream the logs. Logs can be moved to tt-accessible area once task finishes.
        • If we chose (C), task-logs can be put in shared space readable by all users, and so are automatically available.

        Depending on these, I think that even though (B) sacrifices some of the strict 700 restrictions to a more free 511/444, it keeps things simple. But I am open to other proposals too. Thoughts?

        Show
        Vinod Kumar Vavilapalli added a comment - Some (broad) proposals for solving this issue: Localization (A) Move the whole localization out of the taskTracker o be done as the user. Adv: Because everything is done by the user, there is no hassle of changing permission now and then in TT. We just need to support reading of data back by the TT for serving. Disadv: (As Devaraj pointed out in a quick chat) Synchronizing localization across the different process becomes quite complicated (B) Separate tt-only, child-only space from shared space. TT-only and child-only spaces are exclusively for the TT and the child respectively. TT does localization in tt-only area, task-controller binary then moves directory structure to the child only area. The shared space is for the stuff generated by the child for TT and has restricted access (511 on dirs and 444 on files) for TT and others. Even though other users can read this area, they won't be able to delete/write stuff. Adv: Keeps things very simple DisAdv: Sacrifices some of the stiff 700 acess restrictions in favour of a more manageable 511/444 permissions. (C) Instead of separating the directory structures completely, use the same for both TT and the user wherever necessary. Adv : Avoids replication of the directory structure DisAdv: Paths closer to the mapred-local-dir are owned by TT and further down the paths are owned by the child. Currently, task use same mapred.local.dir as task-tracker. When tasks need a path for writing their output, the LocalDirAllocator checks write permission on root directory owned by tt only and would fail We will have to handle this by modifying the mapred-local-dir of the child. Intermediate output If we chose (A) or (C) for localization, we need to run the task-controller again to make the output accessible to the TT If we chose (B) for localization, intermediate output is automatically available to the TT. Task logs If we chose (A) or (C), whenever there is a request for the logs, we need to run the task-controller to run to stream the logs. Logs can be moved to tt-accessible area once task finishes. If we chose (C), task-logs can be put in shared space readable by all users, and so are automatically available. Depending on these, I think that even though (B) sacrifices some of the strict 700 restrictions to a more free 511/444, it keeps things simple. But I am open to other proposals too. Thoughts?
        Hide
        Vinod Kumar Vavilapalli added a comment - - edited

        I am summarizing the state-of-the-art local data management on TT

        Localization of task on the TT:

        • Job localization
          • happens once per job
          • creates taskTracker/jobcache/jobid, tasktracker/jobcache/jobid/work directories recursively
          • downloads job.xml to taskTracker/jobcache/jobid/job.xml and job jar to tasktracer/jobcache/jobid/jars/job.jar
        • Task localization
          • happens once per task
          • creates task's work directory recursively: taskTracker/jobcache/jobid/taskid[.cleanup]/work
          • if needed, localizes archives and files for distributed cache to tasktracker/archive and/or creates symlinks in the task's work directory, and rewrites job.xml
          • creates mapred.child.tmp directory
          • creates hadoop.log.dir|userlogs/taskid/ recursively and marks child to redirect its stdout and stderr to this directory
          • creates taskjvm.sh in case of linux task controller

        Intermediate output files

        • All the intermediate files created by tasks run as user in taskTracker/jobcache/jobid/taskid[.cleanup]/output/ which is recursively created on demand by the child jvm.

        Intermediate output serving

        • TaskTracker directly reads the map-intermediate output files from taskTracker/jobcache/jobid/taskid[.cleanup]/output/ and serves it to reduces via MapOutputServlet

        Task logs' serving

        • Syslogs of tasks are created by the child jvm in hadoop.log.dir|userlogs/taskid as syslog
        • TaskTracker directly reads the tasks' logs and serves it via TaskLogServlet

        In summary, the current directory structure follows. Unless otherwise stated, directories/files are owned by TT but used by both TT and child.

        taskTracker
            |- archive
            |- jobcache
                |- jobid
                    |- work
                    |- job.xml
                    |- jars/job.jar
                    |- taskid[.cleanup]
                        |- work
                            |- job.xml / task-specific
                            |- taskjvm.sh ( created and used by TT)
                            |- output ( owned by child, used by TT)
                                |- all intermediate output files ( owned by child, used by TT)
        
        mapred.child.tmp ( owned and used by child)
        
        hadoop.log.dir|userlogs (owned and used by child)
            |- taskid       (owned and used by child)
                |- stdout   ( owned by child, used by TT)
                |- stderr   ( owned by child, used by TT)
                |- syslog  ( owned by child, used by TT)
        
        Show
        Vinod Kumar Vavilapalli added a comment - - edited I am summarizing the state-of-the-art local data management on TT Localization of task on the TT: Job localization happens once per job creates taskTracker/jobcache/jobid, tasktracker/jobcache/jobid/work directories recursively downloads job.xml to taskTracker/jobcache/jobid/job.xml and job jar to tasktracer/jobcache/jobid/jars/job.jar Task localization happens once per task creates task's work directory recursively: taskTracker/jobcache/jobid/taskid [.cleanup] /work if needed, localizes archives and files for distributed cache to tasktracker/archive and/or creates symlinks in the task's work directory, and rewrites job.xml creates mapred.child.tmp directory creates hadoop.log.dir|userlogs/taskid/ recursively and marks child to redirect its stdout and stderr to this directory creates taskjvm.sh in case of linux task controller Intermediate output files All the intermediate files created by tasks run as user in taskTracker/jobcache/jobid/taskid [.cleanup] /output/ which is recursively created on demand by the child jvm. Intermediate output serving TaskTracker directly reads the map-intermediate output files from taskTracker/jobcache/jobid/taskid [.cleanup] /output/ and serves it to reduces via MapOutputServlet Task logs' serving Syslogs of tasks are created by the child jvm in hadoop.log.dir|userlogs/taskid as syslog TaskTracker directly reads the tasks' logs and serves it via TaskLogServlet In summary, the current directory structure follows. Unless otherwise stated, directories/files are owned by TT but used by both TT and child. taskTracker |- archive |- jobcache |- jobid |- work |- job.xml |- jars/job.jar |- taskid[.cleanup] |- work |- job.xml / task-specific |- taskjvm.sh ( created and used by TT) |- output ( owned by child, used by TT) |- all intermediate output files ( owned by child, used by TT) mapred.child.tmp ( owned and used by child) hadoop.log.dir|userlogs (owned and used by child) |- taskid (owned and used by child) |- stdout ( owned by child, used by TT) |- stderr ( owned by child, used by TT) |- syslog ( owned by child, used by TT)
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Here's an overview of the problem(with most of the input from Sreekanth Ramakrishnan):

        Objective:

        • Setup secure permissions for task files and job files. Secure permissions means permission only to the user running the job (700).

        Problems to solve:
        With HADOOP-4490, framework is in place to run the user's process as the user himself/herself. To continue to run jobs successfully, HADOOP-4490 set permissions of all files to 777. The scope of this issue is o set secure permissions for these files. After setting these,

        • we should be able to run jobs successfully as a user different from the TaskTracker user,
        • all the files the user owns have to be set 700 permissions,
        • TT should continue to be able to serve map intermediate outputs to reduces and serve user-logs via TaskLogServlet.

        Issues involved:

        • handling localization for tasks.
        • handling intermediate output files.
        • handling intermediate output serving.
        Show
        Vinod Kumar Vavilapalli added a comment - Here's an overview of the problem(with most of the input from Sreekanth Ramakrishnan ): Objective : Setup secure permissions for task files and job files. Secure permissions means permission only to the user running the job (700). Problems to solve : With HADOOP-4490 , framework is in place to run the user's process as the user himself/herself. To continue to run jobs successfully, HADOOP-4490 set permissions of all files to 777. The scope of this issue is o set secure permissions for these files. After setting these, we should be able to run jobs successfully as a user different from the TaskTracker user, all the files the user owns have to be set 700 permissions, TT should continue to be able to serve map intermediate outputs to reduces and serve user-logs via TaskLogServlet. Issues involved: handling localization for tasks. handling intermediate output files. handling intermediate output serving.
        Hide
        Kan Zhang added a comment -

        > It will be interesting to see how we will deal with intermediate data (map output).
        I have opened a new JIRA for it. See HADOOP-4991.

        Show
        Kan Zhang added a comment - > It will be interesting to see how we will deal with intermediate data (map output). I have opened a new JIRA for it. See HADOOP-4991 .
        Hide
        Amar Kamat added a comment -

        It will be interesting to see how we will deal with intermediate data (map output). Even if we store intermediate data under user's permission, we will still have to open it for shuffle/reduce.

        Show
        Amar Kamat added a comment - It will be interesting to see how we will deal with intermediate data (map output). Even if we store intermediate data under user's permission, we will still have to open it for shuffle/reduce.

          People

          • Assignee:
            Vinod Kumar Vavilapalli
            Reporter:
            Arun C Murthy
          • Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development