Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1752

Implement getFileBlockLocations in HarFilesystem

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: harchive
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      To efficiently run map reduce on the data that has been HAR'ed it will be great to actually implement getFileBlockLocations for a given filename.
      This way the JobTracker will have information about data locality and will schedule tasks appropriately.
      I believe the overhead introduced by doing lookups in the index files can be smaller than that of copying data over the wire.
      Will upload the patch shortly, but would love to get some feedback on this. And any ideas on how to test it are very welcome.

        Attachments

        1. MR-1752.patch
          2 kB
          Dmytro Molkov
        2. MAPREDUCE-1752.2.patch
          5 kB
          Dmytro Molkov
        3. MAPREDUCE-1752.3.patch
          11 kB
          Patrick Kling

          Issue Links

            Activity

              People

              • Assignee:
                dms Dmytro Molkov
                Reporter:
                dms Dmytro Molkov
              • Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: