[MAPREDUCE-5958] Wrong reduce task progress if map output is compressed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1
Fix Version/s: 2.6.0
Component/s: None
Labels:
- progress
- reduce

Target Version/s:

2.6.0
Hadoop Flags:

Reviewed

Description

If the map output is compressed (mapreduce.map.output.compress set to true) then the reduce task progress may be highly underestimated.

In the reduce phase (but also in the merge phase), the progress of a reduce task is computed as the ratio between the number of processed bytes and the number of total bytes. But:

the number of total bytes is computed by summing up the uncompressed segment sizes (Merger.Segment.getRawDataLength())

the number of processed bytes is computed by exploiting the position of the current IFile.Reader (using IFile.Reader.getPosition()) but this may refer to the position in the underlying on disk file (which may be compressed)

Thus, if the map outputs are compressed then the progress may be underestimated (e.g., only 1 map output ondisk file, the compressed file is 25% of its original size, then the reduce task progress during the reduce phase will range between 0 and 0.25 and then artificially jump to 1.0).

Attached there is a patch: the number of processed bytes is now computed by exploiting IFile.Reader.bytesRead (if the the reader is in memory, then getPosition() already returns exactly this field).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE-5958v3.patch
30/Oct/14 20:50
11 kB
Jason Darrell Lowe
HADOOP-5958-v2.patch
05/Jul/14 16:47
2 kB
Emilio Coppa

Issue Links

relates to

MAPREDUCE-5760 Reduce task percentage is going beyond 100% for a job

Open

Activity

People

Assignee:: Emilio Coppa

Reporter:: Emilio Coppa

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 05/Jul/14 16:21

Updated:: 01/Dec/14 03:11

Resolved:: 06/Nov/14 21:58