Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
2.0.2-alpha, 0.23.5
-
None
-
Reviewed
Description
It is possible for a networking issue to happen where the RM thinks an AM has gone down and launches a replacement, but the previous AM is still up and running. If the previous AM does not need any more resources from the RM it could try to commit either tasks or jobs. This could cause lots of problems where the second AM finishes and tries to commit too. This could result in data corruption.
Attachments
Attachments
Issue Links
- relates to
-
MAPREDUCE-4831 Task commit can occur more than once due to AM retries
- Resolved
-
MAPREDUCE-4819 AM can rerun job after reporting final job status to the client
- Closed