Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1752

Implement getFileBlockLocations in HarFilesystem

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.23.0
    • harchive
    • None
    • Reviewed

    Description

      To efficiently run map reduce on the data that has been HAR'ed it will be great to actually implement getFileBlockLocations for a given filename.
      This way the JobTracker will have information about data locality and will schedule tasks appropriately.
      I believe the overhead introduced by doing lookups in the index files can be smaller than that of copying data over the wire.
      Will upload the patch shortly, but would love to get some feedback on this. And any ideas on how to test it are very welcome.

      Attachments

        1. MR-1752.patch
          2 kB
          Dmytro Molkov
        2. MAPREDUCE-1752.3.patch
          11 kB
          Patrick Kling
        3. MAPREDUCE-1752.2.patch
          5 kB
          Dmytro Molkov

        Issue Links

          Activity

            People

              dms Dmytro Molkov
              dms Dmytro Molkov
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: