Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3710

last split generated by FileInputFormat.getSplits may not have the best locality

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0, 1.0.0
    • Fix Version/s: 0.23.1
    • Component/s: mrv1, mrv2
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Improved FileInputFormat to return better locality for the last split.

      Description

      The last split generated by FileInputFormat.getSplits considers blkLocations.length-1 to be the hosts for the split.
      The last split may be larger than the rest (SPLIT_SLOP=1.1 by default) - in which case locality is picked up from a smaller block.
      e.g. 1027MB file with a 128MB split size. The last split ends up being 131MB. The hosts for locality end up being the nodes containing the 3MB block instead of the 128MB block.

        Attachments

        1. MR-3710_v1.txt
          13 kB
          Siddharth Seth

          Issue Links

            Activity

              People

              • Assignee:
                sseth Siddharth Seth
                Reporter:
                sseth Siddharth Seth
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: