Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1216

Hadoop should support reduce none option



    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.13.0
    • None
    • None


      This has been a highly desired feature in streaming world and was asked occationally in the non-streaming side.
      Streaming implemented a working (hacking) solution. But it also generates discrepency between hadoop
      streaming/non-streaming model. It would be nice if Hadoop offers such a feature
      that works both streaming and non-streaming. Owen and I discussed this a bit and here is the
      general idea for further discussions/suggestions:

      1. Allows the user to specify reducer=none in jobconf.
      2. The user still can specify output format and output directory
      3. Each mapper will generate an output file in the specified directory. The naming convention can still be like part-xxxxxxxx
      where xxxxxxxx is the map task number.
      4. The mapoutput collector of a mapper task will be a record writer on the
      5. The mapper will call output.collect() to write the output, thus the same mapper class can be
      used, regardless reducer none is set or not.

      When reducer is set to none for a job, there will be no mapoutput files writen on to local file system at all,
      and no data shuffling between mappers and reducers. As a mapper of fact, the framework may choose
      not to create reducers at all.


        1. patch_1216.txt
          16 kB
          Runping Qi



            runping Runping Qi
            runping Runping Qi
            0 Vote for this issue
            0 Start watching this issue