Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-416

Move the completed jobs' history files to a DONE subdirectory inside the configured history directory

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Hide
      Once the job is done, the history file and associated conf file is moved to history.folder/done folder. This is done to avoid garbling the running jobs' folder and the framework no longer gets affected with the files in the done folder. This helps in 2 was
      1) ls on running folder (recovery) is faster with less files
      2) changes in running folder results into FileNotFoundException.


      So with existing code, the best way to keep the running folder clean is to note the id's of running job and then move files that are not in this list to the done folder. Note that on an avg there will be 2 files in the history folder namely
      1) job history file
      2) conf file.

      With restart, there might be more than 2 files, mostly the extra conf files. In such a case keep the oldest conf file (based on timestamp) and delete the rest. Note that this its better to do this when the jobtracker is down.
      Show
      Once the job is done, the history file and associated conf file is moved to history.folder/done folder. This is done to avoid garbling the running jobs' folder and the framework no longer gets affected with the files in the done folder. This helps in 2 was 1) ls on running folder (recovery) is faster with less files 2) changes in running folder results into FileNotFoundException. So with existing code, the best way to keep the running folder clean is to note the id's of running job and then move files that are not in this list to the done folder. Note that on an avg there will be 2 files in the history folder namely 1) job history file 2) conf file. With restart, there might be more than 2 files, mostly the extra conf files. In such a case keep the oldest conf file (based on timestamp) and delete the rest. Note that this its better to do this when the jobtracker is down.

      Description

      Whenever a job completes, the history file can be moved to a directory called DONE. That would make the management of job history files easier (for example, administrators can move the history files from that directory to some other place, delete them, archive them, etc.).

      1. MAPREDUCE-416-v1.6-branch-0.20.patch
        32 kB
        Amar Kamat
      2. MAPREDUCE-416-v1.6.patch
        31 kB
        Amar Kamat
      3. MAPREDUCE-416-v1.5.patch
        31 kB
        Amar Kamat
      4. MAPREDUCE-416-v1.4.patch
        31 kB
        Amar Kamat
      5. MAPREDUCE-416-v1.3.patch
        31 kB
        Amar Kamat
      6. HADOOP-5994-v2.1.patch
        39 kB
        Amar Kamat
      7. HADOOP-5994-v2.0.patch
        34 kB
        Amar Kamat
      8. HADOOP-5994-v1.8.patch
        22 kB
        Amar Kamat
      9. HADOOP-5994-v1.7.patch
        22 kB
        Amar Kamat

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #15 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/15/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #15 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/15/ )
          Hide
          Devaraj Das added a comment -

          I just committed this. Thanks, Amar!

          Show
          Devaraj Das added a comment - I just committed this. Thanks, Amar!
          Hide
          Amar Kamat added a comment -

          Attaching a new patch that fixes issues in contrib projects. Following test-cases failed

          1. TestStreamingExitStatus FAILED
          2. TestStreamingStderr FAILED (timeout)
          3. TestQueueCapacities FAILED (timeout)

          All of these are known issues.

          Also attaching an example patch for branch-20 not to be committed.

          Show
          Amar Kamat added a comment - Attaching a new patch that fixes issues in contrib projects. Following test-cases failed TestStreamingExitStatus FAILED TestStreamingStderr FAILED (timeout) TestQueueCapacities FAILED (timeout) All of these are known issues. Also attaching an example patch for branch-20 not to be committed.
          Hide
          Amar Kamat added a comment -

          Attaching a latest patch. test-patch
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 6 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Ant tests passed on my box except TestJobTrackerRestartWithLostTracker FAILED (timeout), TestLazyOutput FAILED, TestReduceFetch FAILED.

          Running contrib tests.

          Show
          Amar Kamat added a comment - Attaching a latest patch. test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Ant tests passed on my box except TestJobTrackerRestartWithLostTracker FAILED (timeout), TestLazyOutput FAILED, TestReduceFetch FAILED. Running contrib tests.
          Hide
          Amar Kamat added a comment -

          Attaching a new patch with a bug fix. Result of test-patch
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 6 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Running ant tests now.

          Show
          Amar Kamat added a comment - Attaching a new patch with a bug fix. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Running ant tests now.
          Hide
          Amar Kamat added a comment -

          Attaching a latest patch. Testing in progress.

          Show
          Amar Kamat added a comment - Attaching a latest patch. Testing in progress.
          Hide
          Amar Kamat added a comment -

          New patch with a bug fix, Result of test-patch
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 6 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Running ant test now.

          Show
          Amar Kamat added a comment - New patch with a bug fix, Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Running ant test now.
          Hide
          Amar Kamat added a comment -

          Attaching a new patch incorporating Devaraj's offline comments. Result of test-patch
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 6 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Show
          Amar Kamat added a comment - Attaching a new patch incorporating Devaraj's offline comments. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Hide
          Iyappan Srinivasan added a comment -

          The previous test comment was put by me.

          Show
          Iyappan Srinivasan added a comment - The previous test comment was put by me.
          Hide
          Hadoop QA added a comment -

          Tested these scenarios and found them to pass:

          1) Launch a job, allow it to complete and then verify if the related
          files move from history directory to done directory.

          2) Launch a job, kill it in the middle and verify if the related
          files move from history directory to done directory.

          3) Launch multiple jobs (say 5) simultaneously and kill them as soon
          as they start and verify if the related files move from history
          directory to done directory.

          4) Remove "done" directory and then kill the job to verify if done
          directory is created again.

          5) Kill a task before it says "setup successful".
          Expected behaviour: It moves the related files to done directory.
          Observed behaviour : same

          6) Older files under done folder can be removed without any issue to
          jobtracker.

          Expected behaviour: Files from "done" folder have to be removed after
          a specific time and job tracker should not be affected.

          Observed behaviour: Older files from "done" folder are removed after
          a successful job has been executed. job tracker si not affected by this.
          But the files don't get removed when jobs are killed. For this a Jira was raised:
          https://issues.apache.org/jira/browse/HADOOP-6050

          But the overall testscenario passes as the files in done directory is removed both by manually deleting it or by Jobtracker filecleaner deleting it, without affecting the running jobtracker.

          7) The webui of "Job History" works, whether the job is killed or it
          completes successfully. All the links in that "Job History" page is
          working.

          8) The in memory (running) jobs, shown in Jobtracker ui, works
          correctly. I checked all links.

          Show
          Hadoop QA added a comment - Tested these scenarios and found them to pass: 1) Launch a job, allow it to complete and then verify if the related files move from history directory to done directory. 2) Launch a job, kill it in the middle and verify if the related files move from history directory to done directory. 3) Launch multiple jobs (say 5) simultaneously and kill them as soon as they start and verify if the related files move from history directory to done directory. 4) Remove "done" directory and then kill the job to verify if done directory is created again. 5) Kill a task before it says "setup successful". Expected behaviour: It moves the related files to done directory. Observed behaviour : same 6) Older files under done folder can be removed without any issue to jobtracker. Expected behaviour: Files from "done" folder have to be removed after a specific time and job tracker should not be affected. Observed behaviour: Older files from "done" folder are removed after a successful job has been executed. job tracker si not affected by this. But the files don't get removed when jobs are killed. For this a Jira was raised: https://issues.apache.org/jira/browse/HADOOP-6050 But the overall testscenario passes as the files in done directory is removed both by manually deleting it or by Jobtracker filecleaner deleting it, without affecting the running jobtracker. 7) The webui of "Job History" works, whether the job is killed or it completes successfully. All the links in that "Job History" page is working. 8) The in memory (running) jobs, shown in Jobtracker ui, works correctly. I checked all links.
          Hide
          Amar Kamat added a comment -

          Ant tests passed on my box.

          Show
          Amar Kamat added a comment - Ant tests passed on my box.
          Hide
          Amar Kamat added a comment -

          Fixed a small bug w.r.t. killing of un-initialized jobs. Result of test-patch
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 6 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Runnning ant-test now

          Show
          Amar Kamat added a comment - Fixed a small bug w.r.t. killing of un-initialized jobs. Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Runnning ant-test now
          Hide
          Amar Kamat added a comment -

          Result of test-patch
          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 6 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Running ant test now.

          Show
          Amar Kamat added a comment - Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Running ant test now.
          Hide
          Amar Kamat added a comment -

          Attaching a patch that moves completed jobs to a done subfolder. Changes the testcases accordingly. Following are the changes

          1. jobhistory now maintains an internal mapping from job-id to jobhistory-filename
          2. finalize recovery reuses these cached values for renaming
          3. jobtracker now invoked jobhistory.markAsCompleted() which moves the job to the done folder. the config files are also moved to the done folder
          4. jobhistory's searching api's namely getJobHistoryFilename() and getJobHistoryLogLocation() now expect a boolean indicating if the job is runnning or complete
          5. added a check in TestJobHistory to check if the filename returned by the framework after job completion belongs to a done folder or not, files are removed from the running folder. Same is also tested for config files too
          6. HistoryCleaner now runs on done folder
          7. Made sure that once the job files are moved to done folder, the framework (except historycleaner) is not affected by changes to done folder. History cleaner will fail and re-run upon an IOException.

          Running test-patch

          Tested this patch on a single node cluster and it works fine.

          Show
          Amar Kamat added a comment - Attaching a patch that moves completed jobs to a done subfolder. Changes the testcases accordingly. Following are the changes jobhistory now maintains an internal mapping from job-id to jobhistory-filename finalize recovery reuses these cached values for renaming jobtracker now invoked jobhistory.markAsCompleted() which moves the job to the done folder. the config files are also moved to the done folder jobhistory's searching api's namely getJobHistoryFilename() and getJobHistoryLogLocation() now expect a boolean indicating if the job is runnning or complete added a check in TestJobHistory to check if the filename returned by the framework after job completion belongs to a done folder or not, files are removed from the running folder. Same is also tested for config files too HistoryCleaner now runs on done folder Made sure that once the job files are moved to done folder, the framework (except historycleaner) is not affected by changes to done folder. History cleaner will fail and re-run upon an IOException. Running test-patch Tested this patch on a single node cluster and it works fine.
          Hide
          Amar Kamat added a comment -

          That would make the management of job history files easier (for example, administrators can move the history files from that directory to some other place, delete them, archive them, etc.).

          I think for this we need to move completed files to a done folder.

          Show
          Amar Kamat added a comment - That would make the management of job history files easier (for example, administrators can move the history files from that directory to some other place, delete them, archive them, etc.). I think for this we need to move completed files to a done folder.
          Hide
          Amar Kamat added a comment -

          Moving completed jobs to a done folder seems to be a big change to me. All the system build around hadoop will change. I propose to have running jobs in a running folder and move the history file for completed jobs to the parent folder. This will involve lesser changes w.r.t external systems and testcases and also all the changes that will be needed will be during job submission/recovery only. Thoughts?

          Show
          Amar Kamat added a comment - Moving completed jobs to a done folder seems to be a big change to me. All the system build around hadoop will change. I propose to have running jobs in a running folder and move the history file for completed jobs to the parent folder. This will involve lesser changes w.r.t external systems and testcases and also all the changes that will be needed will be during job submission/recovery only. Thoughts?

            People

            • Assignee:
              Amar Kamat
              Reporter:
              Devaraj Das
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development