[HADOOP-248] locating map outputs via random probing is inefficient - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.2.1
Fix Version/s: 0.12.0
Component/s: None
Labels:
None

Description

Currently the ReduceTaskRunner polls the JobTracker for a random list of map tasks asking for their output locations. It would be better if the JobTracker kept an ordered log and the interface was changed to:

class MapLocationResults {
public int getTimestamp();
public MapOutputLocation[] getLocations();
}

interface InterTrackerProtocol {
...
MapLocationResults locateMapOutputs(int prevTimestamp);
}

with the intention that each time a ReduceTaskRunner calls locateMapOutputs, it passes back the "timestamp" that it got from the previous result. That way, reduces can easily find the new MapOutputs. This should help the "ramp up" when the maps first start finishing.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

248-9.patch
25/Jan/07 23:40
21 kB
Owen O'Malley
248-fixed1.patch
22/Feb/07 14:45
23 kB
Devaraj Das
248-initial7.patch
24/Jan/07 13:47
21 kB
Devaraj Das
248-initial8.patch
25/Jan/07 16:50
22 kB
Devaraj Das

Issue Links

depends upon

HADOOP-801 job tracker should keep a log of task completion and failure

Closed

relates to

HADOOP-343 In case of dead task tracker, the copy mapouts try copying all mapoutputs from this tasktracker

Closed

Activity

People

Assignee:: Devaraj Das

Reporter:: Owen O'Malley

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 24/May/06 06:25

Updated:: 02/May/13 02:29

Resolved:: 22/Feb/07 20:22