Issue Details (XML | Word | Printable)

Key: HADOOP-3019
Type: New Feature New Feature
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Chris Douglas
Reporter: Doug Cutting
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

want input sampler & sorted partitioner

Created: 14/Mar/08 04:22 PM   Updated: 08/Jul/09 04:52 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 0.19.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 3019-0.patch 2008-09-15 10:39 PM Chris Douglas 33 kB
Text File Licensed for inclusion in ASF works 3019-1.patch 2008-09-15 11:27 PM Chris Douglas 34 kB
Text File Licensed for inclusion in ASF works 3019-2.patch 2008-09-17 12:01 AM Chris Douglas 34 kB
Text File Licensed for inclusion in ASF works 3019-3.patch 2008-09-19 10:46 PM Chris Douglas 39 kB
Text File Licensed for inclusion in ASF works 3019-4.patch 2008-09-19 11:07 PM Chris Douglas 39 kB
Text File Licensed for inclusion in ASF works 3019-5.patch 2008-09-19 11:32 PM Chris Douglas 40 kB
Issue Links:
Blocker
 

Hadoop Flags: Reviewed
Release Note: Added a partitioner that effects a total order of output data, and an input sampler for generating the partition keyset for TotalOrderPartitioner for when the map's input keytype and distribution approximates its output.
Resolution Date: 19/Sep/08 11:39 PM


 Description  « Hide
The input sampler should generate a small, random sample of the input, saved to a file.

The partitioner should read the sample file and partition keys into relatively even-sized key-ranges, where the partition numbers correspond to key order.

Note that when the sampler is used for partitioning, the number of samples required is proportional to the number of reduce partitions. 10x the intended reducer count should give good results.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order