Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-221

Generic 'Sort' Infrastructure for Map-Reduce framework.

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • None
    • None

    Description

      It would be useful to add a generic sort infrastructure to the Map-Reduce framework to ease usage.
      Specifically the idea to add a fairly generic and powerful comparator which can be configured by the user to meet his specific needs.

      Spec:
      --------

      The proposal is to model generic (uber) comparator along the lines of the the standard unix sort command. The comparator provides the following (configurable) functionality:

      a) Separator for breaking up the data (stream) into 'columns'.
      b) Multiple key ranges for specifying priorities of 'columns'. (ala --keys/-k option of unix sort i.e. -k 2,3 -k 1,4 etc.)
      c) A variant of a) to let user specify byte range-boundaries without using a separator for 'columns'.
      d) Option to sort 'reverse'.
      e) Option to do a 'stable' sort i.e. don't do a last-ditch comparision of all bytes if all key ranges match.
      f) Option to do 'numeric' comparisions instead of lexicographical comparisions?

      Of course all these are optional with the default behaviour as-is today.

      • * - * -

      Anything more/less?

      thanks,
      Arun

      Attachments

        Activity

          People

            acmurthy Arun Murthy
            acmurthy Arun Murthy
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: