Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3710

last split generated by FileInputFormat.getSplits may not have the best locality

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.23.0, 1.0.0
    • 0.23.1
    • mrv1, mrv2
    • None
    • Reviewed
    • Improved FileInputFormat to return better locality for the last split.

    Description

      The last split generated by FileInputFormat.getSplits considers blkLocations.length-1 to be the hosts for the split.
      The last split may be larger than the rest (SPLIT_SLOP=1.1 by default) - in which case locality is picked up from a smaller block.
      e.g. 1027MB file with a 128MB split size. The last split ends up being 131MB. The hosts for locality end up being the nodes containing the 3MB block instead of the 128MB block.

      Attachments

        1. MR-3710_v1.txt
          13 kB
          Siddharth Seth

        Issue Links

          Activity

            People

              sseth Siddharth Seth
              sseth Siddharth Seth
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: