Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4428

A failed job is not available under job history if the job is killed right around the time job is notified as failed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.0-alpha
    • None
    • None

    Description

      We have observed this issue consistently running hadoop CDH4 version (based upon 2.0 alpha release):

      In case our hadoop client code gets a notification for a completed job ( using RunningJob object job, with (job.isComplete() && job.isSuccessful()==false)
      the hadoop client code does an unconditional job.killJob() to terminate the job.

      With earlier hadoop versions (verified on hadoop 0.20.2 version), we still have full access to job logs afterwards through hadoop console. However, when using MapReduceV2, the failed hadoop job no longer shows up under jobhistory server. Also, the tracking URL of the job still points to the non-existent Application master http port.

      Once we removed the call to job.killJob() for failed jobs from our hadoop client code, we were able to access the job in job history with mapreduce V2 as well. Therefore this appears to be a race condition in the job management wrt. job history for failed jobs.

      We do have the application master and node manager logs collected for this scenario if that'll help isolate the problem and the fix better.

      Attachments

        1. am_failed_counter_limits.txt
          2.40 MB
          Rahul Jain
        2. resrcmgr_bad.txt
          202 kB
          Rahul Jain
        3. appMaster_good.txt
          578 kB
          Rahul Jain
        4. appMaster_bad.txt
          512 kB
          Rahul Jain

        Issue Links

          Activity

            People

              revans2 Robert Joseph Evans
              rjain7 Rahul Jain
              Votes:
              3 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated: