Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2212

MapTask and ReduceTask should only compress/decompress the final map output file

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: task
    • Labels:
      None

      Description

      Currently if we set mapred.map.output.compression.codec
      1. MapTask will compress every spill, decompress every spill, merge and compress the final map output file
      2. ReduceTask will decompress, merge and compress every map output file. And repeat the compression/decompression every pass.

      This causes all the data being compressed/decompressed many times.
      The reason we need mapred.map.output.compression.codec is for network traffic.
      We should not compress/decompress the data again and again during merge sort.

      We should only compress the final map output file that will be transmitted over the network.

        Activity

          People

          • Assignee:
            Scott Chen
            Reporter:
            Scott Chen
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development