[MAPREDUCE-4428] A failed job is not available under job history if the job is killed right around the time job is notified as failed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.0.0-alpha
Fix Version/s: None
Component/s: jobhistoryserver, jobtracker
Labels:
None

Description

We have observed this issue consistently running hadoop CDH4 version (based upon 2.0 alpha release):

In case our hadoop client code gets a notification for a completed job ( using RunningJob object job, with (job.isComplete() && job.isSuccessful()==false)
the hadoop client code does an unconditional job.killJob() to terminate the job.

With earlier hadoop versions (verified on hadoop 0.20.2 version), we still have full access to job logs afterwards through hadoop console. However, when using MapReduceV2, the failed hadoop job no longer shows up under jobhistory server. Also, the tracking URL of the job still points to the non-existent Application master http port.

Once we removed the call to job.killJob() for failed jobs from our hadoop client code, we were able to access the job in job history with mapreduce V2 as well. Therefore this appears to be a race condition in the job management wrt. job history for failed jobs.

We do have the application master and node manager logs collected for this scenario if that'll help isolate the problem and the fix better.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

am_failed_counter_limits.txt
12/Jul/12 17:46
2.40 MB
Rahul Jain
appMaster_bad.txt
11/Jul/12 19:53
512 kB
Rahul Jain
appMaster_good.txt
11/Jul/12 19:53
578 kB
Rahul Jain
resrcmgr_bad.txt
12/Jul/12 05:54
202 kB
Rahul Jain

Issue Links

is related to

MAPREDUCE-4559 Job logs not accessible through job history server for AM killed due to am.liveness-monitor expiry

Open

relates to

MAPREDUCE-5418 JobHistoryServer has no information about applications if the MR-AM crashes

Resolved

Activity

People

Assignee:: Robert Joseph Evans

Reporter:: Rahul Jain

Votes:: 3 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 11/Jul/12 18:26

Updated:: 12/Sep/13 04:41