Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1422

Changing permissions of files/dirs under job-work-dir may be needed sothat cleaning up of job-dir in all mapred-local-directories succeeds always

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Introduced enableJobForCleanup() api in TaskController. This api enables deletion of stray files (with no write permissions for task-tracker) from job's work dir. Note that the behavior is similar to TaskController#enableTaskForCleanup() except the path on which the 'chmod' is done is the job's work dir.
      Show
      Introduced enableJobForCleanup() api in TaskController. This api enables deletion of stray files (with no write permissions for task-tracker) from job's work dir. Note that the behavior is similar to TaskController#enableTaskForCleanup() except the path on which the 'chmod' is done is the job's work dir.

      Description

      After MAPREDUCE-896, if LinuxTaskController is set in config, task-controller binary is launched for changing permissions of taskAttemptDir and taskWorkDir before cleaning up of these directories sothat cleanup will be succeeded even if user had created files/dirs under taskAttemptDir or taskWorkDir with non-writable permissions. Users can't create files/dirs under job-dir directly as we set 2570 for job-dir. But as job-work-dir has 2770 permissions and user can create files/dirs under job-work-dir with non-writable permissions, Changing permissions of files/dirs under job-work-dir may be needed sothat cleaning up of job-dir in all mapred-local-directories succeeds always.

      1. mapreduce-1422-y20s.patch
        35 kB
        Hemanth Yamijala
      2. mapreduce-1422-v1.5.5.patch
        34 kB
        Amar Kamat
      3. mapreduce-1422-v1.5.4.patch
        35 kB
        Amar Kamat
      4. mapreduce-1422-v1.5.2.patch
        32 kB
        Amar Kamat
      5. mapreduce-1422-test-v1.0.patch
        5 kB
        Amar Kamat
      6. mapreduce-1422-v1.4.2.patch
        35 kB
        Amar Kamat

        Issue Links

          Activity

          Hide
          Amar Kamat added a comment -

          Attaching a patch for review. Changes are as follows

          • Added a hook in task-controller.c for enabling job cleanup.
          • Modified TestTaskTrackerLocalization to check if the default task-controller deletes job-local-dir with files in it such that the task-tracker doesn't have delete permissions.
          • Added a new testcase (TestJobLocalDirCleanupWithLinuxTaskController) to check the above mentioned scenario with LinuxTaskController.
          • Modified the framework to support job-local-dir cleanup

          test-patch passed. All tests except the following passed :

          • TestMiniMRLocalFS
          • TestTTResourceReporting
          • TestTrackerDistributedCacheManager
          • TestFileArgs
          • TestSimulatorEndToEnd
          • TestSimulatorJobTracker
          • TestSimulatorSerialJobSubmission
          • TestSimulatorStressJobSubmission

          All the above failed on trunk too.

          Show
          Amar Kamat added a comment - Attaching a patch for review. Changes are as follows Added a hook in task-controller.c for enabling job cleanup. Modified TestTaskTrackerLocalization to check if the default task-controller deletes job-local-dir with files in it such that the task-tracker doesn't have delete permissions. Added a new testcase (TestJobLocalDirCleanupWithLinuxTaskController) to check the above mentioned scenario with LinuxTaskController. Modified the framework to support job-local-dir cleanup test-patch passed. All tests except the following passed : TestMiniMRLocalFS TestTTResourceReporting TestTrackerDistributedCacheManager TestFileArgs TestSimulatorEndToEnd TestSimulatorJobTracker TestSimulatorSerialJobSubmission TestSimulatorStressJobSubmission All the above failed on trunk too.
          Hide
          Amar Kamat added a comment -

          Attaching an example test-job that can be run on a cluster to verify the fix. Here are the steps to verify :
          1) Reproduce the bug

          1) Apply this patch.
          2) Deploy the cluster
          3) run './bin/hadoop jar build/hadoop-mapred-test-*.jar badjobexample -m 1 -mt 10000 -recordt 10000'
          4) While the job is running, check ${job-work-dir}/subDir/ on the tasktracker
          5) Even after the job completes, some files are left behind in the job-work-dir.
          

          2) Verify the fix
          Apply the patch along with the attached patch and repeat the steps mentioned above. With this patch, the job-work-dir is properly cleanup as expected.

          Note : job-work-dir for the default case would look something like

          {mapred.local.dir}

          /taskTracker/

          {user.name}

          /jobcache/

          {job.id}

          /work/

          Show
          Amar Kamat added a comment - Attaching an example test-job that can be run on a cluster to verify the fix. Here are the steps to verify : 1) Reproduce the bug 1) Apply this patch. 2) Deploy the cluster 3) run './bin/hadoop jar build/hadoop-mapred-test-*.jar badjobexample -m 1 -mt 10000 -recordt 10000' 4) While the job is running, check ${job-work-dir}/subDir/ on the tasktracker 5) Even after the job completes, some files are left behind in the job-work-dir. 2) Verify the fix Apply the patch along with the attached patch and repeat the steps mentioned above. With this patch, the job-work-dir is properly cleanup as expected. Note : job-work-dir for the default case would look something like {mapred.local.dir} /taskTracker/ {user.name} /jobcache/ {job.id} /work/
          Hide
          Hemanth Yamijala added a comment -

          I looked at this patch. It seems pretty close. A few very minor comments:

          • enable_path_for_cleanup can be marked static as it is not supposed to be used outside taskcontroller.c
          • Can we refactor the code in LinuxTaskController.enable {Job|Task}

            ForCleanup a little bit to avoid duplication to the extent possible:

            private void enablePathForCleanup(String user, PathDeletionContext context, TaskControllerCommands command, List<String> commandArgs) {
              // move common code...
            }
            
            void enableJobForCleanup(PathDeletionContext context) {
                // check type of context and cast
                // build args
                List<String> args = buildJobCleanupArgs(context);
                enablePathForCleanup(jContext.user, context, TaskControllerCommands.ENABLE_JOB_FOR_CLEANUP, args);
            }
            
          • I would suggest that instead of combining the test case for job work directory cleanup verification with existing tests, we define a new testcase like testJobDirectory cleanup. In this, we can do the following:
            • Write files into job work directory without write permissions (as done in the patch)
            • Write files into task directory without write permissions - This simulates a scenario where map tasks are not cleaned up until the Job cleanup command comes to the tasktracker.
              Then we can use the model of checking for stale paths in the cleanup queue to verify cleanup has happened successfully.
          • Also, I don't think we need the new class TestJobLocalDirCleanupWithLinuxTaskController. Note there already exists a class TestLocalizationWithLinuxTaskController that extends TestTaskTrackerLocalization. This would automatically mean that all tests in TestTaskTrackerLocalization get executed along with LinuxTaskController as well. Can you please confirm this ?
          Show
          Hemanth Yamijala added a comment - I looked at this patch. It seems pretty close. A few very minor comments: enable_path_for_cleanup can be marked static as it is not supposed to be used outside taskcontroller.c Can we refactor the code in LinuxTaskController.enable {Job|Task} ForCleanup a little bit to avoid duplication to the extent possible: private void enablePathForCleanup( String user, PathDeletionContext context, TaskControllerCommands command, List< String > commandArgs) { // move common code... } void enableJobForCleanup(PathDeletionContext context) { // check type of context and cast // build args List< String > args = buildJobCleanupArgs(context); enablePathForCleanup(jContext.user, context, TaskControllerCommands.ENABLE_JOB_FOR_CLEANUP, args); } I would suggest that instead of combining the test case for job work directory cleanup verification with existing tests, we define a new testcase like testJobDirectory cleanup. In this, we can do the following: Write files into job work directory without write permissions (as done in the patch) Write files into task directory without write permissions - This simulates a scenario where map tasks are not cleaned up until the Job cleanup command comes to the tasktracker. Then we can use the model of checking for stale paths in the cleanup queue to verify cleanup has happened successfully. Also, I don't think we need the new class TestJobLocalDirCleanupWithLinuxTaskController. Note there already exists a class TestLocalizationWithLinuxTaskController that extends TestTaskTrackerLocalization. This would automatically mean that all tests in TestTaskTrackerLocalization get executed along with LinuxTaskController as well. Can you please confirm this ?
          Hide
          Amar Kamat added a comment -

          Hemanth,
          I am confused w.r.t point #3 and #4. I added test-code in TestTaskTrackerLocalization for testing job-work-dir cleanup because there is a call to tracker.removeJobFiles(). Now TestLocalizationWithLinuxTaskController (that extends TestTaskTrackerLocalization) will also test the job-work-dir cleanup code with LinuxTaskController. As you are suggesting, if we move this job cleanup test-code into a separate testcase, then we need TestJobLocalDirCleanupWithLinuxTaskController, no?

          Another way to do this would be to introduce a new testcase (say TestJobDirCleanup) as you suggested which will test the job-dir and task-dirs upon job completion. Also add a testcase (say TestJobDirCleanupWithLinuxTaskController) extending TestJobDirCleanup but configured with LinuxTaskController. Thoughts?

          Show
          Amar Kamat added a comment - Hemanth, I am confused w.r.t point #3 and #4. I added test-code in TestTaskTrackerLocalization for testing job-work-dir cleanup because there is a call to tracker.removeJobFiles() . Now TestLocalizationWithLinuxTaskController (that extends TestTaskTrackerLocalization) will also test the job-work-dir cleanup code with LinuxTaskController. As you are suggesting, if we move this job cleanup test-code into a separate testcase, then we need TestJobLocalDirCleanupWithLinuxTaskController, no? Another way to do this would be to introduce a new testcase (say TestJobDirCleanup) as you suggested which will test the job-dir and task-dirs upon job completion. Also add a testcase (say TestJobDirCleanupWithLinuxTaskController) extending TestJobDirCleanup but configured with LinuxTaskController. Thoughts?
          Hide
          Hemanth Yamijala added a comment -

          Amar,

          I suppose I might have been misleading using the term 'test case'. I did not mean test case as a separate class. I meant it as a separate test method in the same class TestTrackerTrackerLocalization. The motivation is to keep unit tests focused on testing one thing at a time, so it will be easier to find what failed when there is a failure. Currently since the existing test methods that you modified all test task cleanup, I suggested you move the code to a separate test method focusing on testing job cleanup.

          If we do this, we don't need TestJobLocalDirCleanupWithLinuxTaskController, right ?

          Does this make sense now ?

          Show
          Hemanth Yamijala added a comment - Amar, I suppose I might have been misleading using the term 'test case'. I did not mean test case as a separate class. I meant it as a separate test method in the same class TestTrackerTrackerLocalization. The motivation is to keep unit tests focused on testing one thing at a time, so it will be easier to find what failed when there is a failure. Currently since the existing test methods that you modified all test task cleanup, I suggested you move the code to a separate test method focusing on testing job cleanup. If we do this, we don't need TestJobLocalDirCleanupWithLinuxTaskController, right ? Does this make sense now ?
          Hide
          Amar Kamat added a comment -

          Attaching a new patch for review after incorporating Hemanth's review comments. test-patch and ant tests passed on my box. Cluster test using BadJobExample.java also passed.

          Show
          Amar Kamat added a comment - Attaching a new patch for review after incorporating Hemanth's review comments. test-patch and ant tests passed on my box. Cluster test using BadJobExample.java also passed.
          Hide
          Ravi Gummadi added a comment -

          Patch looks good overall.

          One minor comment: Please change the method name buildTaskControllerPathDeletionContexts() to buildTaskControllerTaskPathDeletionContexts() in TaskTracker.java for better clarity and consistency.

          Show
          Ravi Gummadi added a comment - Patch looks good overall. One minor comment: Please change the method name buildTaskControllerPathDeletionContexts() to buildTaskControllerTaskPathDeletionContexts() in TaskTracker.java for better clarity and consistency.
          Hide
          Hemanth Yamijala added a comment -

          Amar, I think you need to write files into the task directory and not task-work directory to simulate the case I mentioned in my comments above. AFAIK, task work directory is always cleaned up, but task directory is not cleaned up sometimes until a job cleanup happens. I would request we make the test to simulate the exact scenario. Also, just so to avoid false positives concerning this kind of loop:

              for (Path p : tPaths) {
                if (tracker.getLocalFileSystem().exists(p)) {
                  createFileAndSetPermissions(localizedJobConf, p);
                }
              }
          

          can you please verify at least one path existed ? I am worried that if for some reason no paths exist, then the code will run through without doing any special removal at all.

          Show
          Hemanth Yamijala added a comment - Amar, I think you need to write files into the task directory and not task-work directory to simulate the case I mentioned in my comments above. AFAIK, task work directory is always cleaned up, but task directory is not cleaned up sometimes until a job cleanup happens. I would request we make the test to simulate the exact scenario. Also, just so to avoid false positives concerning this kind of loop: for (Path p : tPaths) { if (tracker.getLocalFileSystem().exists(p)) { createFileAndSetPermissions(localizedJobConf, p); } } can you please verify at least one path existed ? I am worried that if for some reason no paths exist, then the code will run through without doing any special removal at all.
          Hide
          Amar Kamat added a comment -

          Please change the method name buildTaskControllerPathDeletionContexts() to buildTaskControllerTaskPathDeletionContexts()

          I kept it that way to avoid unnecessary diffs in my patch. Let me try to change it in this patch.

          AFAIK, task work directory is always cleaned up, but task directory is not cleaned up sometimes until a job cleanup happens

          Can you tell me a scenario where this can happen. But it better we add files to task dir too.

          Also, just so to avoid false positives concerning this kind of loop

          Just to add more confidence to this patch, I added log statements and checked it. I will add checks to make sure that atleast one file gets written. I see similar code in TestTaskTrackerLocalization#validateRemoveFiles(). I will add checks there too. Thoughts?

          Show
          Amar Kamat added a comment - Please change the method name buildTaskControllerPathDeletionContexts() to buildTaskControllerTaskPathDeletionContexts() I kept it that way to avoid unnecessary diffs in my patch. Let me try to change it in this patch. AFAIK, task work directory is always cleaned up, but task directory is not cleaned up sometimes until a job cleanup happens Can you tell me a scenario where this can happen. But it better we add files to task dir too. Also, just so to avoid false positives concerning this kind of loop Just to add more confidence to this patch, I added log statements and checked it. I will add checks to make sure that atleast one file gets written. I see similar code in TestTaskTrackerLocalization#validateRemoveFiles(). I will add checks there too. Thoughts?
          Hide
          Amar Kamat added a comment -

          Attaching a new patch with the following changes

          • added checks to make sure that the job-work-dirs/task-local-dirs contains atleast one path (i.e the false positive concern Hemanth raised)
          • added extra stray files in task-attempt local dir before calling removeJobFiles() to test the case where the jvm reuse is enabled and attempt dirs are left behind to be cleaned up upon kill-job-action (as suggested by Hemanth)
          • renamed the TaskTracker#buildTaskControllerPathDeletionContexts() to TaskTracker#buildTaskControllerTaskPathDeletionContexts() (as suggested by Ravi)

          test-patch passed. All ant tests passed.

          Show
          Amar Kamat added a comment - Attaching a new patch with the following changes added checks to make sure that the job-work-dirs/task-local-dirs contains atleast one path (i.e the false positive concern Hemanth raised) added extra stray files in task-attempt local dir before calling removeJobFiles() to test the case where the jvm reuse is enabled and attempt dirs are left behind to be cleaned up upon kill-job-action (as suggested by Hemanth) renamed the TaskTracker#buildTaskControllerPathDeletionContexts() to TaskTracker#buildTaskControllerTaskPathDeletionContexts() (as suggested by Ravi) test-patch passed. All ant tests passed.
          Hide
          Amar Kamat added a comment -

          MAPREDUCE-927 got committed after I uploaded the patch. Attaching a new patch incorporating changes from the merge. test-patch passed. This is only a test-case change and tested the test-case separately.

          Show
          Amar Kamat added a comment - MAPREDUCE-927 got committed after I uploaded the patch. Attaching a new patch incorporating changes from the merge. test-patch passed. This is only a test-case change and tested the test-case separately.
          Hide
          Hemanth Yamijala added a comment -

          The latest patch looks good to me. +1.

          Note though that:

          added checks to make sure that the job-work-dirs/task-local-dirs contains atleast one path

          This is not checking the existence of paths, the returned array is simply a list of path objects created with no bearing on the physical directory / file creation. However, I think this is still fine because calling localizeJob ensures work directory is created.... this is tested in testJobLocalization(). Likewise, calling localizeTask ensures attempt directory is created.. this is tested in testTaskLocalization. So, I think it is OK even to not check these conditions in the new test method.

          Show
          Hemanth Yamijala added a comment - The latest patch looks good to me. +1. Note though that: added checks to make sure that the job-work-dirs/task-local-dirs contains atleast one path This is not checking the existence of paths, the returned array is simply a list of path objects created with no bearing on the physical directory / file creation. However, I think this is still fine because calling localizeJob ensures work directory is created.... this is tested in testJobLocalization(). Likewise, calling localizeTask ensures attempt directory is created.. this is tested in testTaskLocalization. So, I think it is OK even to not check these conditions in the new test method.
          Hide
          Hemanth Yamijala added a comment -

          Trying Hudson.

          Show
          Hemanth Yamijala added a comment - Trying Hudson.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12438295/mapreduce-1422-v1.5.5.patch
          against trunk revision 920793.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/512/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/512/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/512/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/512/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12438295/mapreduce-1422-v1.5.5.patch against trunk revision 920793. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/512/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/512/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/512/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/512/console This message is automatically generated.
          Hide
          Hemanth Yamijala added a comment -

          All LinuxTaskController tests also passed. This is good for commit.

          Show
          Hemanth Yamijala added a comment - All LinuxTaskController tests also passed. This is good for commit.
          Hide
          Hemanth Yamijala added a comment -

          I committed this to trunk. Thanks, Amar !

          Show
          Hemanth Yamijala added a comment - I committed this to trunk. Thanks, Amar !
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #271 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/271/)
          . Fix cleanup of localized job directory to work if files with non-deletable permissions are created within it. Contributed by Amar Kamat.

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #271 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/271/ ) . Fix cleanup of localized job directory to work if files with non-deletable permissions are created within it. Contributed by Amar Kamat.
          Hide
          Hemanth Yamijala added a comment -

          Patch for earlier version of Hadoop. Not for commit here.

          Show
          Hemanth Yamijala added a comment - Patch for earlier version of Hadoop. Not for commit here.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #254 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/254/)
          . Fix cleanup of localized job directory to work if files with non-deletable permissions are created within it. Contributed by Amar Kamat.

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #254 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/254/ ) . Fix cleanup of localized job directory to work if files with non-deletable permissions are created within it. Contributed by Amar Kamat.

            People

            • Assignee:
              Amar Kamat
              Reporter:
              Ravi Gummadi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development