Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-416

Move the completed jobs' history files to a DONE subdirectory inside the configured history directory

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Hide
      Once the job is done, the history file and associated conf file is moved to history.folder/done folder. This is done to avoid garbling the running jobs' folder and the framework no longer gets affected with the files in the done folder. This helps in 2 was
      1) ls on running folder (recovery) is faster with less files
      2) changes in running folder results into FileNotFoundException.


      So with existing code, the best way to keep the running folder clean is to note the id's of running job and then move files that are not in this list to the done folder. Note that on an avg there will be 2 files in the history folder namely
      1) job history file
      2) conf file.

      With restart, there might be more than 2 files, mostly the extra conf files. In such a case keep the oldest conf file (based on timestamp) and delete the rest. Note that this its better to do this when the jobtracker is down.
      Show
      Once the job is done, the history file and associated conf file is moved to history.folder/done folder. This is done to avoid garbling the running jobs' folder and the framework no longer gets affected with the files in the done folder. This helps in 2 was 1) ls on running folder (recovery) is faster with less files 2) changes in running folder results into FileNotFoundException. So with existing code, the best way to keep the running folder clean is to note the id's of running job and then move files that are not in this list to the done folder. Note that on an avg there will be 2 files in the history folder namely 1) job history file 2) conf file. With restart, there might be more than 2 files, mostly the extra conf files. In such a case keep the oldest conf file (based on timestamp) and delete the rest. Note that this its better to do this when the jobtracker is down.

      Description

      Whenever a job completes, the history file can be moved to a directory called DONE. That would make the management of job history files easier (for example, administrators can move the history files from that directory to some other place, delete them, archive them, etc.).

        Attachments

        1. HADOOP-5994-v1.7.patch
          22 kB
          Amar Kamat
        2. HADOOP-5994-v1.8.patch
          22 kB
          Amar Kamat
        3. HADOOP-5994-v2.0.patch
          34 kB
          Amar Kamat
        4. HADOOP-5994-v2.1.patch
          39 kB
          Amar Kamat
        5. MAPREDUCE-416-v1.3.patch
          31 kB
          Amar Kamat
        6. MAPREDUCE-416-v1.4.patch
          31 kB
          Amar Kamat
        7. MAPREDUCE-416-v1.5.patch
          31 kB
          Amar Kamat
        8. MAPREDUCE-416-v1.6.patch
          31 kB
          Amar Kamat
        9. MAPREDUCE-416-v1.6-branch-0.20.patch
          32 kB
          Amar Kamat

          Issue Links

            Activity

              People

              • Assignee:
                amar_kamat Amar Kamat
                Reporter:
                devaraj Devaraj Das
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: