[MAPREDUCE-2038] Making reduce tasks locality-aware - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

Currently Hadoop MapReduce framework does not take into consideration of data locality when it decides to launch reduce tasks. There are several cases where it could become sub-optimal.

The map output data for a particular reduce task are not distributed evenly across different racks. This could happen when the job does not have many maps, or when there is heavy skew in map output data.
A reduce task may need to access some side file (e.g. Pig fragmented join, or incremental merge of unsorted smaller dataset with an already sorted large dataset). It'd be useful to place reduce tasks based on the location of the side files they need to access.

This jira is created for the purpose of soliciting ideas on how we can make it better.

Attachments

Issue Links

duplicates

MAPREDUCE-259 Rack-aware Shuffle

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Hong Tang

Votes:: 0 Vote for this issue

Watchers:: 24 Start watching this issue

Dates

Created:: 27/Aug/10 21:36

Updated:: 17/Jul/14 17:37