Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
None
-
None
-
None
-
None
Description
It would be useful to add a generic sort infrastructure to the Map-Reduce framework to ease usage.
Specifically the idea to add a fairly generic and powerful comparator which can be configured by the user to meet his specific needs.
Spec:
--------
The proposal is to model generic (uber) comparator along the lines of the the standard unix sort command. The comparator provides the following (configurable) functionality:
a) Separator for breaking up the data (stream) into 'columns'.
b) Multiple key ranges for specifying priorities of 'columns'. (ala --keys/-k option of unix sort i.e. -k 2,3 -k 1,4 etc.)
c) A variant of a) to let user specify byte range-boundaries without using a separator for 'columns'.
d) Option to sort 'reverse'.
e) Option to do a 'stable' sort i.e. don't do a last-ditch comparision of all bytes if all key ranges match.
f) Option to do 'numeric' comparisions instead of lexicographical comparisions?
Of course all these are optional with the default behaviour as-is today.
- * - * -
Anything more/less?
thanks,
Arun