[HADOOP-1284] clean up the protocol between stream mapper/reducer and the framework - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.13.0
Component/s: None
Labels:
None

Description

Right now, the protocol between stream mapper/reducer and the framework is very inflexible.
The mapper/reducer generates line oriented output. The framework picks up line by line, and split
each line into a key/value pair. By default, the substring up to the first tab char is the key, and the
substring after the first tab char is the value.

However, in many cases, the application wants some control over how the pair is split.
Here, I'd like to introduce the following configuration variables for that:

1. "streaming.output.field.separator": the value will be the tab key, by default.
But the user can specify a different one (e.g. ':', or ', ', etc.)
A map output line can be considered as a list of fields separated by the separator.

2. "streaming.num.fields.for.mapout.key": the number of the first fields will be used the map output key
(and for sorting in the reduce side).
The default value is 1.
The rest of the fields will be used as the value. For example, I can specify the first 5 fields as my mapout key.

3. "streaming.num.fields.for.partitioning": Sometimes, I want to use fewer fields for partitioning to
achieve "primary/secondary" composite
key effect as proposed in HADOOP485. The default value is 1.
For example, I can set "streaming.num.fields.for.partitioning" to 3
and "streaming.num.fields.for.mapout.key" to 5.
This effectively amounts to saying that fields 4 and 5 are my secondary key.

With the above default values, it is compatible with the current behavior
while introducing a new desirable feature in a clean way.

Thoughts?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

patch-1284.txt
25/Apr/07 18:12
35 kB
Runping Qi

Activity

People

Assignee:: Runping Qi

Reporter:: Runping Qi

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 21/Apr/07 00:31

Updated:: 08/Jun/07 20:40

Resolved:: 26/Apr/07 20:59