Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.23.0
-
None
-
Reviewed
Description
Karams was executing a sleep job with 100,000 tasks on a 350 node cluster to test MR AM's scalability and ran into this. The job ran successfully but the history was not available.
I debugged around and figured that the job is finishing prematurely before the JobHistory is written. In most of the cases, we don't see this bug as we have a 5 seconds sleep in AM towards the end.