Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-416

Move the completed jobs' history files to a DONE subdirectory inside the configured history directory

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Hide
      Once the job is done, the history file and associated conf file is moved to history.folder/done folder. This is done to avoid garbling the running jobs' folder and the framework no longer gets affected with the files in the done folder. This helps in 2 was
      1) ls on running folder (recovery) is faster with less files
      2) changes in running folder results into FileNotFoundException.


      So with existing code, the best way to keep the running folder clean is to note the id's of running job and then move files that are not in this list to the done folder. Note that on an avg there will be 2 files in the history folder namely
      1) job history file
      2) conf file.

      With restart, there might be more than 2 files, mostly the extra conf files. In such a case keep the oldest conf file (based on timestamp) and delete the rest. Note that this its better to do this when the jobtracker is down.
      Show
      Once the job is done, the history file and associated conf file is moved to history.folder/done folder. This is done to avoid garbling the running jobs' folder and the framework no longer gets affected with the files in the done folder. This helps in 2 was 1) ls on running folder (recovery) is faster with less files 2) changes in running folder results into FileNotFoundException. So with existing code, the best way to keep the running folder clean is to note the id's of running job and then move files that are not in this list to the done folder. Note that on an avg there will be 2 files in the history folder namely 1) job history file 2) conf file. With restart, there might be more than 2 files, mostly the extra conf files. In such a case keep the oldest conf file (based on timestamp) and delete the rest. Note that this its better to do this when the jobtracker is down.

      Description

      Whenever a job completes, the history file can be moved to a directory called DONE. That would make the management of job history files easier (for example, administrators can move the history files from that directory to some other place, delete them, archive them, etc.).

      1. MAPREDUCE-416-v1.6-branch-0.20.patch
        32 kB
        Amar Kamat
      2. MAPREDUCE-416-v1.6.patch
        31 kB
        Amar Kamat
      3. MAPREDUCE-416-v1.5.patch
        31 kB
        Amar Kamat
      4. MAPREDUCE-416-v1.4.patch
        31 kB
        Amar Kamat
      5. MAPREDUCE-416-v1.3.patch
        31 kB
        Amar Kamat
      6. HADOOP-5994-v2.1.patch
        39 kB
        Amar Kamat
      7. HADOOP-5994-v2.0.patch
        34 kB
        Amar Kamat
      8. HADOOP-5994-v1.8.patch
        22 kB
        Amar Kamat
      9. HADOOP-5994-v1.7.patch
        22 kB
        Amar Kamat

        Issue Links

          Activity

          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Devaraj Das made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 0.21.0 [ 12314045 ]
          Resolution Fixed [ 1 ]
          Amar Kamat made changes -
          Release Note Once the job is done, the history file and associated conf file is moved to history.folder/done folder. This is done to avoid garbling the running jobs' folder and the framework no longer gets affected with the files in the done folder. This helps in 2 was
          1) ls on running folder (recovery) is faster with less files
          2) changes in running folder results into FileNotFoundException.


          So with existing code, the best way to keep the running folder clean is to note the id's of running job and then move files that are not in this list to the done folder. Note that on an avg there will be 2 files in the history folder namely
          1) job history file
          2) conf file.

          With restart, there might be more than 2 files, mostly the extra conf files. In such a case keep the oldest conf file (based on timestamp) and delete the rest. Note that this its better to do this when the jobtracker is down.
          Amar Kamat made changes -
          Attachment MAPREDUCE-416-v1.6.patch [ 12411937 ]
          Attachment MAPREDUCE-416-v1.6-branch-0.20.patch [ 12411938 ]
          Amar Kamat made changes -
          Attachment MAPREDUCE-416-v1.5.patch [ 12411914 ]
          Amar Kamat made changes -
          Attachment MAPREDUCE-416-v1.4.patch [ 12411899 ]
          Amar Kamat made changes -
          Link This issue incorporates MAPREDUCE-276 [ MAPREDUCE-276 ]
          Amar Kamat made changes -
          Attachment MAPREDUCE-416-v1.3.patch [ 12411785 ]
          Owen O'Malley made changes -
          Project Hadoop Common [ 12310240 ] Hadoop Map/Reduce [ 12310941 ]
          Key HADOOP-5994 MAPREDUCE-416
          Component/s mapred [ 12310690 ]
          Fix Version/s 0.21.0 [ 12313563 ]
          Amar Kamat made changes -
          Attachment HADOOP-5994-v2.1.patch [ 12411193 ]
          Amar Kamat made changes -
          Attachment HADOOP-5994-v2.0.patch [ 12411061 ]
          Amar Kamat made changes -
          Attachment HADOOP-5994-v1.8.patch [ 12410461 ]
          Amar Kamat made changes -
          Field Original Value New Value
          Attachment HADOOP-5994-v1.7.patch [ 12410394 ]
          Devaraj Das created issue -

            People

            • Assignee:
              Amar Kamat
              Reporter:
              Devaraj Das
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development