[MAPREDUCE-1838] DistRaid map tasks have large variance in running times - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 0.20.1
Fix Version/s: 0.22.0
Component/s: contrib/raid
Labels:
None

Hadoop Flags:

Reviewed

Description

HDFS RAID uses map-reduce jobs to generate parity files for a set of source files. Each map task gets a subset of files to operate on. The current code assigns files by walking through the list of files given in the constructor of DistRaid

The problem is that the list of files given to the constructor has the order of (pretty much) the directory listing. When a large number of files is added, files in that order tend to have the same size. Thus a map task can end up with large files where as another can end up with small files, increasing the variance in run times.

We could do smarter assignment by using the file sizes.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-1838.patch
30/Jun/10 22:29
0.8 kB
Ramkumar Vadali

Activity

People

Assignee:: Ramkumar Vadali

Reporter:: Ramkumar Vadali

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 02/Jun/10 21:36

Updated:: 29/Oct/10 02:04

Resolved:: 06/Jul/10 06:27