Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.23.1
-
None
-
Reviewed
-
Fixed a race condition in MR AM which is failing the sort benchmark consistently.
Description
With the code checked out on last two days.
Sort Job on 350 node scale with 16800 maps and 680 reduces consistently failing for around last 6 runs
When around 50% of maps are completed, suddenly job jumps to failed state.
On looking at NM log, found RM sent Stop Container Request to NM for AM container.
But at INFO level from RM log not able find why RM is killing AM when job is not killed manually.
One thing found common on failed AM logs is -:
org.apache.hadoop.yarn.state.InvalidStateTransitonException
With with different.
For e.g. One log says -:
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_UPDATE at ASSIGNED
Whereas other logs says -:
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_COUNTER_UPDATE at ERROR
Attachments
Attachments
Issue Links
- fixes
-
MAPREDUCE-7481 Invalid transitions for TaskAttemptImpl
- Open