[HADOOP-750] race condition on stalled map output fetches - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.8.0
Fix Version/s: 0.9.0
Component/s: None
Labels:
None

Description

I've seen reduces getting killed because of a race condition in the ReduceTaskRunner. In the logs it looks like:

2006-11-27 08:40:44,795 WARN org.apache.hadoop.mapred.TaskRunner: Map output copy stalled on http://kry2296.inktomisearch.com:7030/mapOutput?map=task_0001_m_015626_0
...
2006-11-27 09:16:41,361 INFO org.apache.hadoop.mapred.TaskRunner: task_0001_r_000658_0 Need 52 map output(s)
2006-11-27 09:16:41,361 INFO org.apache.hadoop.mapred.TaskRunner: task_0001_r_000658_0 Got 39 known map output location(s); scheduling...
2006-11-27 09:16:41,361 INFO org.apache.hadoop.mapred.TaskRunner: task_0001_r_000658_0 Scheduled 0 of 39 known outputs (0 slow hosts and 39 dup hosts)
...
2006-11-27 09:16:47,071 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_r_000658_0 0.3328575% reduce > copy (28679 of 28720 at 0.76 MB/s) >
...
2006-11-27 09:16:47,338 INFO org.apache.hadoop.mapred.TaskRunner: task_0001_r_000658_0 done copying task_0001_m_015462_0 output from node1
...
2006-11-27 09:36:51,398 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_r_000658_0: Task failed to report status for 1204 seconds. Killing.

Basically, the handling of the stall has a race condition that leaves the fetcher in a bad state. At the end of the fetch, all of the tasks finish and their results never get handled. When the thread times out, all of the map output copiers are waiting for things to fetch and the prepare thread is waiting for results.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

fetch-no-lease.patch
01/Dec/06 21:55
6 kB
Owen O'Malley

Issue Links

is duplicated by

HADOOP-753 Problem with the patch for Hadoop-723

Closed

Activity

People

Assignee:: Owen O'Malley

Reporter:: Owen O'Malley

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 28/Nov/06 00:30

Updated:: 08/Jul/09 16:52

Resolved:: 01/Dec/06 22:27