Hadoop Common
  1. Hadoop Common
  2. HADOOP-2427

Cleanup of mapred.local.dir after maptask is complete

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.15.1
    • Fix Version/s: 0.18.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      The current working directory of a task, i.e. ${mapred.local.dir}/taskTracker/jobcache/<jobid>/<task_dir>/work is cleanedup, as soon as the task is finished.

      Description

      I see that after a map task is complete, its working directory (mapred.local.dir)/taskTracker/jobcache/<jobid>/<task_dir> is not deleted untill the job is complete. If map out files are stored in there, could this be created in different directory and the working directory cleaned up after map task is complete. One problem we are seeing is, if a map task creates files temporary files, they get accumulated and we may run out of disk space thus failing the job. Relying on the user to cleanup all temp files created is be error prone.

      1. patch-2427.txt
        5 kB
        Amareshwari Sriramadasu
      2. patch-2427.txt
        4 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Hide
          Arun C Murthy added a comment -

          I just committed this. Thanks, Amareshwari!

          Show
          Arun C Murthy added a comment - I just committed this. Thanks, Amareshwari!
          Hide
          Amareshwari Sriramadasu added a comment -

          Test failure dfs.TestDistributedUpgrade.testDistributedUpgrade is not related to the patch.

          This issue adds cleaning up of local directories in task tracker for succesful task completion as soon as task finsihes. Its difficult to write a unit test for this, because the all the directories will be deleted at the end of job. TestMiniMRWithDFS.checkTaskDirectories still validates the task directories.

          Show
          Amareshwari Sriramadasu added a comment - Test failure dfs.TestDistributedUpgrade.testDistributedUpgrade is not related to the patch. This issue adds cleaning up of local directories in task tracker for succesful task completion as soon as task finsihes. Its difficult to write a unit test for this, because the all the directories will be deleted at the end of job. TestMiniMRWithDFS.checkTaskDirectories still validates the task directories.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12382994/patch-2427.txt
          against trunk revision 661918.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2534/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2534/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2534/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2534/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12382994/patch-2427.txt against trunk revision 661918. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2534/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2534/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2534/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2534/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          trying to run hudson again.

          Show
          Amareshwari Sriramadasu added a comment - trying to run hudson again.
          Hide
          Amareshwari Sriramadasu added a comment -

          The patch modifies TaskTracker.TaskInProgress.cleanup() to take a boolean parameter to know whether to delete the whole task directory or only the work directory. For successful task attempts, the work directory is cleaned up as soon as task is finished.

          Show
          Amareshwari Sriramadasu added a comment - The patch modifies TaskTracker.TaskInProgress.cleanup() to take a boolean parameter to know whether to delete the whole task directory or only the work directory. For successful task attempts, the work directory is cleaned up as soon as task is finished.
          Hide
          Amareshwari Sriramadasu added a comment -

          Please don't add more to MRConstants.java, WORKDIR should belong to TaskTracker.TaskInProgress.

          WORKDIR is also used in TaskRunner.java, so, I think MRConstant.java is the right place for the public static constant.

          Show
          Amareshwari Sriramadasu added a comment - Please don't add more to MRConstants.java, WORKDIR should belong to TaskTracker.TaskInProgress. WORKDIR is also used in TaskRunner.java, so, I think MRConstant.java is the right place for the public static constant.
          Hide
          Arun C Murthy added a comment -

          I don't think this is the right approach, we really shouldn't be adding more 'cleanup' methods when we already have one in TaskTracker.TaskInProgress, please do the necessary cleanup there if possible.

          Please don't add more to MRConstants.java, WORKDIR should belong to TaskTracker.TaskInProgress.

          Show
          Arun C Murthy added a comment - I don't think this is the right approach, we really shouldn't be adding more 'cleanup' methods when we already have one in TaskTracker.TaskInProgress, please do the necessary cleanup there if possible. Please don't add more to MRConstants.java, WORKDIR should belong to TaskTracker.TaskInProgress.
          Hide
          Amareshwari Sriramadasu added a comment -

          trying to run hudson again.

          Show
          Amareshwari Sriramadasu added a comment - trying to run hudson again.
          Hide
          Amareshwari Sriramadasu added a comment -

          Currently the failed and killed tasks are cleanedup as soon as they report as finished. But successful tasks are cleanedup at the end of the job, so that the map outputs will be available for the reducers. But now, since we have <taskid>/work as current working directory for the task, and <taskid>/output directory for intermediate map output files. So, <taskid>/work can be cleanedup for successful tasks as soon as the task is finished.

          Here is a patch doing cleanup of workdir for successful tasks.

          Show
          Amareshwari Sriramadasu added a comment - Currently the failed and killed tasks are cleanedup as soon as they report as finished. But successful tasks are cleanedup at the end of the job, so that the map outputs will be available for the reducers. But now, since we have <taskid>/work as current working directory for the task, and <taskid>/output directory for intermediate map output files. So, <taskid>/work can be cleanedup for successful tasks as soon as the task is finished. Here is a patch doing cleanup of workdir for successful tasks.

            People

            • Assignee:
              Amareshwari Sriramadasu
              Reporter:
              Lohit Vijayarenu
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development