Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-416

Move the completed jobs' history files to a DONE subdirectory inside the configured history directory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.21.0
    • None
    • None
    • Hide
      Once the job is done, the history file and associated conf file is moved to history.folder/done folder. This is done to avoid garbling the running jobs' folder and the framework no longer gets affected with the files in the done folder. This helps in 2 was
      1) ls on running folder (recovery) is faster with less files
      2) changes in running folder results into FileNotFoundException.


      So with existing code, the best way to keep the running folder clean is to note the id's of running job and then move files that are not in this list to the done folder. Note that on an avg there will be 2 files in the history folder namely
      1) job history file
      2) conf file.

      With restart, there might be more than 2 files, mostly the extra conf files. In such a case keep the oldest conf file (based on timestamp) and delete the rest. Note that this its better to do this when the jobtracker is down.
      Show
      Once the job is done, the history file and associated conf file is moved to history.folder/done folder. This is done to avoid garbling the running jobs' folder and the framework no longer gets affected with the files in the done folder. This helps in 2 was 1) ls on running folder (recovery) is faster with less files 2) changes in running folder results into FileNotFoundException. So with existing code, the best way to keep the running folder clean is to note the id's of running job and then move files that are not in this list to the done folder. Note that on an avg there will be 2 files in the history folder namely 1) job history file 2) conf file. With restart, there might be more than 2 files, mostly the extra conf files. In such a case keep the oldest conf file (based on timestamp) and delete the rest. Note that this its better to do this when the jobtracker is down.

    Description

      Whenever a job completes, the history file can be moved to a directory called DONE. That would make the management of job history files easier (for example, administrators can move the history files from that directory to some other place, delete them, archive them, etc.).

      Attachments

        1. MAPREDUCE-416-v1.6-branch-0.20.patch
          32 kB
          Amar Kamat
        2. MAPREDUCE-416-v1.6.patch
          31 kB
          Amar Kamat
        3. MAPREDUCE-416-v1.5.patch
          31 kB
          Amar Kamat
        4. MAPREDUCE-416-v1.4.patch
          31 kB
          Amar Kamat
        5. MAPREDUCE-416-v1.3.patch
          31 kB
          Amar Kamat
        6. HADOOP-5994-v2.1.patch
          39 kB
          Amar Kamat
        7. HADOOP-5994-v2.0.patch
          34 kB
          Amar Kamat
        8. HADOOP-5994-v1.8.patch
          22 kB
          Amar Kamat
        9. HADOOP-5994-v1.7.patch
          22 kB
          Amar Kamat

        Issue Links

          Activity

            People

              amar_kamat Amar Kamat
              ddas Devaraj Das
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: