Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6367

UniformSizeInputFormat skews left over bytes to last split

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Invalid
    • 2.6.0, 2.5.2
    • None
    • None
    • None
    • Sorry this jira is not needed

    Description

      In UniformSizeInputFormat it is trying to get equal amount of bytes to every split. But the logic today will result in every split having a little less then the perfect amount and that left over from every split will be put into the last split.

      Resulting in a large skew for the last split.

      Below if the area of the code that is affected:

      https://github.com/apache/hadoop/blob/9ae7f9eb7baeb244e1b95aabc93ad8124870b9a9/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/UniformSizeInputFormat.java#L98

      The fix would be to change the following line:

      currentSplitSize += srcFileStatus.getLen();

      to

      currentSplitSize += srcFileStatus.getLen() + (currentSplitSize - nBytesPerSplit);

      Attachments

        Activity

          People

            ted.m Theodore michael Malaska
            ted.m Theodore michael Malaska
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: