[MAPREDUCE-6367] UniformSizeInputFormat skews left over bytes to last split - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Invalid
Affects Version/s: 2.6.0, 2.5.2
Fix Version/s: None
Component/s: None
Labels:
None

Release Note:
Sorry this jira is not needed

Description

In UniformSizeInputFormat it is trying to get equal amount of bytes to every split. But the logic today will result in every split having a little less then the perfect amount and that left over from every split will be put into the last split.

Resulting in a large skew for the last split.

Below if the area of the code that is affected:

https://github.com/apache/hadoop/blob/9ae7f9eb7baeb244e1b95aabc93ad8124870b9a9/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/UniformSizeInputFormat.java#L98

The fix would be to change the following line:

currentSplitSize += srcFileStatus.getLen();

currentSplitSize += srcFileStatus.getLen() + (currentSplitSize - nBytesPerSplit);

Attachments

Activity

People

Assignee:: Theodore michael Malaska

Reporter:: Theodore michael Malaska

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 15/May/15 20:39

Updated:: 18/May/15 00:39

Resolved:: 18/May/15 00:39