[HADOOP-1216] Hadoop should support reduce none option - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.13.0
Component/s: None
Labels:
None

Description

This has been a highly desired feature in streaming world and was asked occationally in the non-streaming side.
Streaming implemented a working (hacking) solution. But it also generates discrepency between hadoop
streaming/non-streaming model. It would be nice if Hadoop offers such a feature
that works both streaming and non-streaming. Owen and I discussed this a bit and here is the
general idea for further discussions/suggestions:

1. Allows the user to specify reducer=none in jobconf.
2. The user still can specify output format and output directory
3. Each mapper will generate an output file in the specified directory. The naming convention can still be like part-xxxxxxxx
where xxxxxxxx is the map task number.
4. The mapoutput collector of a mapper task will be a record writer on the
5. The mapper will call output.collect() to write the output, thus the same mapper class can be
used, regardless reducer none is set or not.

When reducer is set to none for a job, there will be no mapoutput files writen on to local file system at all,
and no data shuffling between mappers and reducers. As a mapper of fact, the framework may choose
not to create reducers at all.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

patch_1216.txt
26/Apr/07 07:59
16 kB
Runping Qi

Activity

People

Assignee:: Runping Qi

Reporter:: Runping Qi

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 06/Apr/07 16:52

Updated:: 08/Jul/09 16:52

Resolved:: 26/Apr/07 21:42