Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-115

permit reduce input types to differ from reduce output types

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.3.0
    • None
    • None

    Description

      When map tasks write intermediate data out, they always use SequencialFile RecordWriter with key/value classes from the job object.

      When the reducers write the final results out, its output format is obtained from the job object. By default, it is TextOutputFormat, and no conflicts.
      However, if one wants to use SequencialFileFormat for the final results, then the key/value classes are also obtained from the job object, the same as the map tasks' output. Now we have a problem. It is impossible for the map outputs and reducer outputs use different key/value classes, if one wants the reducers generate outputs in SequentialFileFormat.

      A simple fix would be to add another two attributes to JobConf class: mapOutputLeyClass and mapOutputValueClass. That allows the user to have different key/value classes for the intermediate and final outputs.

      Attachments

        1. hadoop-115_tk.patch
          11 kB
          Teppo Kurki
        2. hadoop-115_ReduceTask.patch
          13 kB
          Teppo Kurki
        3. patch_115.txt.2006_05_16
          5 kB
          Runping Qi

        Activity

          People

            runping Runping Qi
            runping Runping Qi
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: