Issue Details (XML | Word | Printable)

Key: HADOOP-485
Type: New Feature New Feature
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Tahir Hashmi
Reporter: Owen O'Malley
Votes: 1
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

allow a different comparator for grouping keys in calls to reduce

Created: 26/Aug/06 07:51 AM   Updated: 08/Jul/09 04:51 PM
Return to search
Component/s: None
Affects Version/s: 0.5.0
Fix Version/s: 0.13.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 485.patch 2007-05-02 09:54 PM Tahir Hashmi 8 kB
Text File Licensed for inclusion in ASF works 485.patch 2007-05-02 05:44 PM Tahir Hashmi 8 kB
Text File Licensed for inclusion in ASF works 485.patch 2007-04-30 02:10 PM Tahir Hashmi 8 kB
Text File Licensed for inclusion in ASF works 485.patch 2007-04-25 04:19 PM Tahir Hashmi 8 kB
Text File Licensed for inclusion in ASF works 485.patch 2007-04-24 11:30 AM Tahir Hashmi 7 kB
Text File Licensed for inclusion in ASF works Hadoop-485-pre.patch 2007-04-18 02:11 PM Tahir Hashmi 3 kB
Text File Licensed for inclusion in ASF works TestUserValueGrouping.java.patch 2007-04-18 02:15 PM Tahir Hashmi 5 kB
Issue Links:
Duplicate
 

Resolution Date: 03/May/07 07:37 PM


 Description  « Hide
Some algorithms require that the values to the reduce be sorted in a particular order, but extending the key with the additional fields causes them to be handled by different calls to reduce. (The user then collects the values until they detect a "real" key change and then processes them.)

It would be much easier if the framework let you define a second comparator that did the grouping of values for reduces. So your reduce inputs look like:

A1, V1
A2, V2
A3, V3
B1, V4
B2, V5

instead of getting calls to reduce that look like:

reduce(A1, {V1}); reduce(A2, {V2}); reduce(A3, {V3}); reduce(B1, {V4}); reduce(B2, {V5});

you could define the grouping comparator to just compare the letters and end up with:

reduce(A1, {V1,V2,V3}); reduce(B1, {V4,V5});

which is the desired outcome. Note that this assumes that the "extra" part of the key is just for sorting because the reduce will only see the first representative of each equivalence class.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
No work has yet been logged on this issue.