[MAPREDUCE-4890] Invalid TaskImpl state transitions when task fails while speculating - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.0.2-alpha, 0.23.5
Fix Version/s: 2.0.3-alpha, 0.23.6
Component/s: mr-am
Labels:
None

Target Version/s:

2.0.3-alpha, 0.23.6
Hadoop Flags:

Reviewed

Description

There are a couple of issues when a task fails while speculating (i.e.: multiple attempts are active):

The other active attempts are not killed.
TaskImpl's FAILED state does not handle the T_ATTEMPT_* set of events which can be sent from the other active attempts. These all need to be handled since they can be sent asynchronously from the other active task attempts.

Failure to handle this properly means jobs that are configured to normally tolerate failures via mapreduce.map.failures.maxpercent or mapreduce.reduce.failures.maxpercent and also speculate can easily end up failing due to invalid state transitions rather than complete successfully with a few explicitly allowed task failures.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-4890.patch
19/Dec/12 18:03
6 kB
Jason Darrell Lowe

Activity

People

Assignee:: Jason Darrell Lowe

Reporter:: Jason Darrell Lowe

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 19/Dec/12 02:52

Updated:: 15/Feb/13 13:09

Resolved:: 22/Dec/12 01:52