Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2059

RecoveryManager attempts to add jobtracker.info

    Details

    • Hadoop Flags:
      Reviewed
    • Tags:
      Jobtracker

      Description

      The jobtracker is treating the file 'jobtracker.info' in the system data directory as a job to be recovered, resulting in the following:

      10/09/09 18:06:02 WARN mapred.JobTracker: Failed to add the job jobtracker.info
      java.lang.IllegalArgumentException: JobId string : jobtracker.info is not properly formed
      at org.apache.hadoop.mapreduce.JobID.forName(JobID.java:158)
      at org.apache.hadoop.mapred.JobID.forName(JobID.java:84)
      at org.apache.hadoop.mapred.JobTracker$RecoveryManager.addJobForRecovery(JobTracker.java:1057)
      at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1565)
      at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:275)
      at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:267)
      at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:262)
      at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4256)

      1. MAPREDUCE-2059.patch
        7 kB
        Konstantin Shvachko
      2. MAPREDUCE-2059.patch
        5 kB
        Subroto Sanyal

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-22-branch #91 (See https://builds.apache.org/job/Hadoop-Mapreduce-22-branch/91/)
        MAPREDUCE-2059. RecoveryManager excludes jobtracker.info from the list of jobs to be recovered. Contributed by Subroto Sanyal.

        shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1204275
        Files :

        • /hadoop/common/branches/branch-0.22/mapreduce/CHANGES.txt
        • /hadoop/common/branches/branch-0.22/mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java
        • /hadoop/common/branches/branch-0.22/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestRecoveryManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-22-branch #91 (See https://builds.apache.org/job/Hadoop-Mapreduce-22-branch/91/ ) MAPREDUCE-2059 . RecoveryManager excludes jobtracker.info from the list of jobs to be recovered. Contributed by Subroto Sanyal. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1204275 Files : /hadoop/common/branches/branch-0.22/mapreduce/CHANGES.txt /hadoop/common/branches/branch-0.22/mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java /hadoop/common/branches/branch-0.22/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestRecoveryManager.java
        Hide
        Konstantin Shvachko added a comment -

        I just committed this to 0.22 branch. Thanks Subroto.
        Keeping it open until the inclusion to 0.20.security is decided.

        Show
        Konstantin Shvachko added a comment - I just committed this to 0.22 branch. Thanks Subroto. Keeping it open until the inclusion to 0.20.security is decided.
        Hide
        Konstantin Shvachko added a comment -

        Otherwise code looks good +1.

        Show
        Konstantin Shvachko added a comment - Otherwise code looks good +1.
        Hide
        Konstantin Shvachko added a comment -

        I was impatient. It runs for about 5 minutes. But the new test was failing, because the previous test case testJobTrackerInfoCreation() was not closing MiniDFSCluter.
        I added the shutdown statement, and cleaned up some deprecations in the new test.
        Also change job completion threshold from 50% to 20%, which reduced running time from 290 sec to 150.

        Show
        Konstantin Shvachko added a comment - I was impatient. It runs for about 5 minutes. But the new test was failing, because the previous test case testJobTrackerInfoCreation() was not closing MiniDFSCluter. I added the shutdown statement, and cleaned up some deprecations in the new test. Also change job completion threshold from 50% to 20%, which reduced running time from 290 sec to 150.
        Hide
        Konstantin Shvachko added a comment -

        I see this problem in 0.22 and I think the fix is right. Unfortunately, the test does not succeed. It loops forever waiting for the job to reach 50% completion which it never does. I would like to commit it to 0.22 if the test is fixed.
        I see that 0.20.security has the same problem.

        Show
        Konstantin Shvachko added a comment - I see this problem in 0.22 and I think the fix is right. Unfortunately, the test does not succeed. It loops forever waiting for the job to reach 50% completion which it never does. I would like to commit it to 0.22 if the test is fixed. I see that 0.20.security has the same problem.
        Hide
        Arun C Murthy added a comment -

        Sorry to come in late, the patch has gone stale. Can you please rebase? Thanks.

        Given this is not an issue with MRv2 should we still commit this? I'm happy to, but not sure it's useful. Thanks.

        Show
        Arun C Murthy added a comment - Sorry to come in late, the patch has gone stale. Can you please rebase? Thanks. Given this is not an issue with MRv2 should we still commit this? I'm happy to, but not sure it's useful. Thanks.
        Hide
        Subroto Sanyal added a comment -

        The attached patch verifies whether the files for job recovery don't start with a name of same as restartCount file (jobtracker.info). This will filter out jobtracker.info.rec if it is present by chance.

        Show
        Subroto Sanyal added a comment - The attached patch verifies whether the files for job recovery don't start with a name of same as restartCount file ( jobtracker.info ). This will filter out jobtracker.info.rec if it is present by chance.
        Hide
        Subroto Sanyal added a comment -

        Hi Dan,

        I can see two approaches to fix the problem:
        a) Change the directory where Job information is getting used. Restart Recover will always search in this folder for Job Recovery.
        Problem: The System Job Directory is being referenced from many places (Jobtracker, Tasktracker, Client). This approach may lead to code changes in multiple files.

        b) Explicit check for jobtracker.info file while recovering job. This change is small and simple.

        Please provide your opinion.

        Show
        Subroto Sanyal added a comment - Hi Dan, I can see two approaches to fix the problem: a) Change the directory where Job information is getting used. Restart Recover will always search in this folder for Job Recovery. Problem: The System Job Directory is being referenced from many places (Jobtracker, Tasktracker, Client). This approach may lead to code changes in multiple files. b) Explicit check for jobtracker.info file while recovering job. This change is small and simple. Please provide your opinion.

          People

          • Assignee:
            Subroto Sanyal
            Reporter:
            Dan Adkins
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development