Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Invalid
-
2.6.0, 2.5.2
-
None
-
None
-
None
-
Sorry this jira is not needed
Description
In UniformSizeInputFormat it is trying to get equal amount of bytes to every split. But the logic today will result in every split having a little less then the perfect amount and that left over from every split will be put into the last split.
Resulting in a large skew for the last split.
Below if the area of the code that is affected:
The fix would be to change the following line:
currentSplitSize += srcFileStatus.getLen();
to
currentSplitSize += srcFileStatus.getLen() + (currentSplitSize - nBytesPerSplit);