[HADOOP-485] allow a different comparator for grouping keys in calls to reduce - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.5.0
Fix Version/s: 0.13.0
Component/s: None
Labels:
None

Description

Some algorithms require that the values to the reduce be sorted in a particular order, but extending the key with the additional fields causes them to be handled by different calls to reduce. (The user then collects the values until they detect a "real" key change and then processes them.)

It would be much easier if the framework let you define a second comparator that did the grouping of values for reduces. So your reduce inputs look like:

A1, V1
A2, V2
A3, V3
B1, V4
B2, V5

instead of getting calls to reduce that look like:

reduce(A1,

{V1}

); reduce(A2,

{V2}

); reduce(A3,

{V3}

); reduce(B1,

{V4}

); reduce(B2,

{V5}

);

you could define the grouping comparator to just compare the letters and end up with:

reduce(A1,

{V1,V2,V3}

); reduce(B1,

{V4,V5}

);

which is the desired outcome. Note that this assumes that the "extra" part of the key is just for sorting because the reduce will only see the first representative of each equivalence class.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

485.patch
02/May/07 21:54
8 kB
Tahir Hashmi
485.patch
02/May/07 17:44
8 kB
Tahir Hashmi
485.patch
30/Apr/07 14:10
8 kB
Tahir Hashmi
485.patch
25/Apr/07 16:19
8 kB
Tahir Hashmi
485.patch
24/Apr/07 11:30
7 kB
Tahir Hashmi
Hadoop-485-pre.patch
18/Apr/07 14:11
3 kB
Tahir Hashmi
TestUserValueGrouping.java.patch
18/Apr/07 14:15
5 kB
Tahir Hashmi

Issue Links

is duplicated by

HADOOP-686 job.setOutputValueComparatorClass(theClass) should be supported

Closed

Activity

People

Assignee:: Tahir Hashmi

Reporter:: Owen O'Malley

Votes:: 1 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 26/Aug/06 07:51

Updated:: 08/Jul/09 16:51

Resolved:: 03/May/07 19:37