Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3019

want input sampler & sorted partitioner

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Added a partitioner that effects a total order of output data, and an input sampler for generating the partition keyset for TotalOrderPartitioner for when the map's input keytype and distribution approximates its output.

      Description

      The input sampler should generate a small, random sample of the input, saved to a file.

      The partitioner should read the sample file and partition keys into relatively even-sized key-ranges, where the partition numbers correspond to key order.

      Note that when the sampler is used for partitioning, the number of samples required is proportional to the number of reduce partitions. 10x the intended reducer count should give good results.

        Attachments

        1. 3019-5.patch
          40 kB
          Christopher Douglas
        2. 3019-4.patch
          39 kB
          Christopher Douglas
        3. 3019-3.patch
          39 kB
          Christopher Douglas
        4. 3019-2.patch
          34 kB
          Christopher Douglas
        5. 3019-1.patch
          34 kB
          Christopher Douglas
        6. 3019-0.patch
          33 kB
          Christopher Douglas

        Issue Links

          Activity

            People

            • Assignee:
              cdouglas Christopher Douglas
              Reporter:
              cutting Doug Cutting

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment