Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-5400

JT restart recovery: Exclude jobs which failed during SUBMIT_JOB (due to acl)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Duplicate
    • None
    • None
    • None
    • None

    Description

      mapred.jobtracker.restart.recover is set to true in mapred-site.xml

      This is a job that failed during Job submit due to invalid ACL

      2009-03-04 18:31:25,970 INFO org.apache.hadoop.ipc.Server: IPC Server handler 14 on 50300, call submitJob(job_200903041223_0259) from 192.168.10.1:41306: error: org.apache.hadoop.security.AccessControlException: User rajive cannot perform operation SUBMIT_JOB on queue default

      When the JobTracker was restarted after some time, the failed job was being recovered/restarted

      2009-03-04 19:13:30,544 INFO org.apache.hadoop.mapred.JobTracker: Found an incomplete job directory job_200903041852_0040. Deleting it!!
      2009-03-04 19:13:30,613 INFO org.apache.hadoop.mapred.FairScheduler: Successfully configured FairScheduler
      2009-03-04 19:13:30,614 INFO org.apache.hadoop.mapred.JobTracker: Trying to recover job job_200903041223_0259

      2009-03-04 18:53:17,147 INFO org.apache.hadoop.mapred.JobTracker: JobTracker failed to recover job job_200903041223_0259. Ignoring it.
      java.io.FileNotFoundException: File file:/var/log/hadoop//history/jobtracker1.foo.com_1236192735577_job_200903041223_0259_rajive_word+count does not exist.
      at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:360)
      at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
      at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
      at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
      at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:336)
      at org.apache.hadoop.mapred.JobHistory.parseHistoryFromFS(JobHistory.java:245)
      at org.apache.hadoop.mapred.JobTracker$RecoveryManager.recover(JobTracker.java:1144)
      at org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:1603)
      at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3326)
      2009-03-04 18:53:17,147 INFO org.apache.hadoop.mapred.JobTracker: Restart count for job job_200903041223_0259 is 0
      2009-03-04 18:53:18,626 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_200903041223_0259 = 4664646202464
      2009-03-04 18:53:18,626 INFO org.apache.hadoop.mapred.JobInProgress: Split info for job:job_200903041223_0259 with 34640 splits:

      These jobs failed during job submit shouldn't be considered for recovery.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rajive Rajiv Chittajallu
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: