Karam Singh was executing a sleep job with 100,000 tasks on a 350 node cluster to test MR AM's scalability and ran into this. The job ran successfully but the history was not available.
I debugged around and figured that the job is finishing prematurely before the JobHistory is written. In most of the cases, we don't see this bug as we have a 5 seconds sleep in AM towards the end.
|Status||Resolved [ 5 ]||Closed [ 6 ]|
|Status||Patch Available [ 10002 ]||Resolved [ 5 ]|
|Resolution||Fixed [ 1 ]|