Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-693

Conf files not moved to "done" subdirectory after JT restart

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Cannot Reproduce
    • Affects Version/s: 0.20.1
    • Fix Version/s: None
    • Component/s: jobtracker
    • Labels:
      None

      Description

      After MAPREDUCE-516, when a job is submitted and the JT is restarted (before job files have been written) and the job is killed after recovery, the conf files fail to be moved to the "done" subdirectory.
      The exact scenario to reproduce this issue is:

      • Submit a job
      • Restart JT before anything is written to the job files
      • Kill the job
      • The old conf files remain in the history folder and fail to be moved to "done" subdirectory

        Activity

        Hide
        Amar Kamat added a comment -

        Used this patch and the branch 0.20 patch for MAPREDUCE-683 and now TestJobTrackerRestart passes.

        Show
        Amar Kamat added a comment - Used this patch and the branch 0.20 patch for MAPREDUCE-683 and now TestJobTrackerRestart passes.
        Hide
        Amar Kamat added a comment -

        Oops. Uploaded the wrong patch. Here is the patch for branch 0.20 not to be committed.

        Show
        Amar Kamat added a comment - Oops. Uploaded the wrong patch. Here is the patch for branch 0.20 not to be committed.
        Hide
        Amar Kamat added a comment -

        Note that this is a feature and MAPREDUCE-11 should fix this issue in the right way. Thoughts?

        Show
        Amar Kamat added a comment - Note that this is a feature and MAPREDUCE-11 should fix this issue in the right way. Thoughts?
        Hide
        Amar Kamat added a comment -

        Attaching a fix for branch 0.20 not to be committed.

        Show
        Amar Kamat added a comment - Attaching a fix for branch 0.20 not to be committed.
        Hide
        Amar Kamat added a comment -

        The old conf files remain in the history folder and fail to be moved to "done" subdirectory

        There is no need to move the conf file to the done folder. In this case the job is run as a new job and hence a new conf file is created for this job. The jobhistory file gets deleted as it is required for recovery (checkpoint process). The conf file is doesnt play any role in the recovery process. Here is what is happening

        1. jobtracker starts with id id1
        2. job job1 is submitted and creates history file hostname_id1_job1_user_jobname and conf file as hostname_id1_job1_conf.xml
        3. jobtracker restart with id id2
        4. jobtracker tries to recover the job. There are 2 possibilities here :
          1. If the job-initialization thread inits the job before the recovery-manager picks up the job for recovery then the new filename would be hostname_id1_job1_user_jobname.recover and the conf file would be hostname_id1_job1_conf.xml. In such a case there wont be any garbage left in the history folder.
          2. If the recovery-manager picks up the job first before the init-thread then it will assume that there is nothing to recover and will delete hostname_id1_job1_user_jobname (leaving hostname_id1_job1_conf.xml). When the job inits, it will take a new filename i.e hostname_id2_job1_user_jobname and hostname_id2_job1_conf.xml. Only in this case the conf file ( hostname_id1_job1_conf.xml) is left behind in the history folder.

        AFAIK this is a timing issue. I think a proper fix for all this corner cases is MAPREDUCE-11. Thoughts?

        Show
        Amar Kamat added a comment - The old conf files remain in the history folder and fail to be moved to "done" subdirectory There is no need to move the conf file to the done folder. In this case the job is run as a new job and hence a new conf file is created for this job. The jobhistory file gets deleted as it is required for recovery (checkpoint process). The conf file is doesnt play any role in the recovery process. Here is what is happening jobtracker starts with id id1 job job1 is submitted and creates history file hostname_id1_job1_user_jobname and conf file as hostname_id1_job1_conf.xml jobtracker restart with id id2 jobtracker tries to recover the job. There are 2 possibilities here : If the job-initialization thread inits the job before the recovery-manager picks up the job for recovery then the new filename would be hostname_id1_job1_user_jobname.recover and the conf file would be hostname_id1_job1_conf.xml. In such a case there wont be any garbage left in the history folder. If the recovery-manager picks up the job first before the init-thread then it will assume that there is nothing to recover and will delete hostname_id1_job1_user_jobname (leaving hostname_id1_job1_conf.xml). When the job inits, it will take a new filename i.e hostname_id2_job1_user_jobname and hostname_id2_job1_conf.xml. Only in this case the conf file ( hostname_id1_job1_conf.xml) is left behind in the history folder. AFAIK this is a timing issue. I think a proper fix for all this corner cases is MAPREDUCE-11 . Thoughts?

          People

          • Assignee:
            Unassigned
            Reporter:
            Ramya Sunil
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development