Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4593

CombineFileInputFormat must ensure it doesn't dupe locations in its InputSplit objects

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • 2.0.0-alpha
    • None
    • client
    • None

    Description

      Currently it seems possible for CombineFileInputFormat's InputSplit objects to grow to very large sizes due to its non-de-duplication of the locations field. We should probably use a set structure to prevent dupe locations from rising the block locations size of InputSplits sent over by CombineFileInputFormat, as that will help performance and help fix unnecessary warnings/errors over block location limits at the JT/MR AM.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              qwertymaniac Harsh J
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: