Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3710

last split generated by FileInputFormat.getSplits may not have the best locality

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0, 1.0.0
    • Fix Version/s: 0.23.1
    • Component/s: mrv1, mrv2
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Improved FileInputFormat to return better locality for the last split.

      Description

      The last split generated by FileInputFormat.getSplits considers blkLocations.length-1 to be the hosts for the split.
      The last split may be larger than the rest (SPLIT_SLOP=1.1 by default) - in which case locality is picked up from a smaller block.
      e.g. 1027MB file with a 128MB split size. The last split ends up being 131MB. The hosts for locality end up being the nodes containing the 3MB block instead of the 128MB block.

      1. MR-3710_v1.txt
        13 kB
        Siddharth Seth

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Siddharth Seth
              Reporter:
              Siddharth Seth
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development