Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: harchive
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Each HAR file system has two index files that contains information on how files are stored in the part files. During the block location calculation, these indexes are reread for every file in the archive. Caching the indexes and the status of the part files will greatly reduce the number of name node operations during the job setup time.

        Attachments

        1. MAPREDUCE-2459.1.patch
          15 kB
          Mac Yang
        2. MAPREDUCE-2459.2.patch
          16 kB
          Mac Yang

          Issue Links

            Activity

              People

              • Assignee:
                macyang Mac Yang
                Reporter:
                macyang Mac Yang
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: