[MAPREDUCE-4832] MR AM can get in a split brain situation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.0.2-alpha, 0.23.5
Fix Version/s: 2.0.3-alpha, 0.23.6
Component/s: applicationmaster
Labels:
None

Target Version/s:

2.0.3-alpha, 0.23.6
Hadoop Flags:

Reviewed

Description

It is possible for a networking issue to happen where the RM thinks an AM has gone down and launches a replacement, but the previous AM is still up and running. If the previous AM does not need any more resources from the RM it could try to commit either tasks or jobs. This could cause lots of problems where the second AM finishes and tries to commit too. This could result in data corruption.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-4832.patch
04/Jan/13 00:28
41 kB
Jason Darrell Lowe
MAPREDUCE-4832.patch
02/Jan/13 21:44
42 kB
Jason Darrell Lowe

Issue Links

relates to

MAPREDUCE-4831 Task commit can occur more than once due to AM retries

Resolved

MAPREDUCE-4819 AM can rerun job after reporting final job status to the client

Closed

Activity

People

Assignee:: Jason Darrell Lowe

Reporter:: Robert Joseph Evans

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 29/Nov/12 20:15

Updated:: 15/Feb/13 13:09

Resolved:: 04/Jan/13 19:35