Details

    • Type: New Feature
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      This feature allows to specify Hadoop Partitioner for the following operations: GROUP/COGROUP, CROSS, DISTINCT, JOIN (except 'skewed' join). Partitioner controls the partitioning of the keys of the intermediate map-outputs. See http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Partitioner.html for more details.

      To use this feature you can add PARTITION BY clause to the appropriate operator:
      A = load 'input_data';
      B = group A by $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2;
      .....
      Here is the code for SimpleCustomPartitioner

      public class SimpleCustomPartitioner extends Partitioner<PigNullableWritable, Writable> {
           //@Override
          public int getPartition(PigNullableWritable key, Writable value, int numPartitions) {
              if(key.getValueAsPigType() instanceof Integer) {
                  int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions);
                  return ret;
             }
             else {
                  return (key.hashCode()) % numPartitions;
              }
          }
      }
      Show
      This feature allows to specify Hadoop Partitioner for the following operations: GROUP/COGROUP, CROSS, DISTINCT, JOIN (except 'skewed' join). Partitioner controls the partitioning of the keys of the intermediate map-outputs. See http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Partitioner.html for more details. To use this feature you can add PARTITION BY clause to the appropriate operator: A = load 'input_data'; B = group A by $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2; ..... Here is the code for SimpleCustomPartitioner public class SimpleCustomPartitioner extends Partitioner<PigNullableWritable, Writable> {      //@Override     public int getPartition(PigNullableWritable key, Writable value, int numPartitions) {         if(key.getValueAsPigType() instanceof Integer) {             int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions);             return ret;        }        else {             return (key.hashCode()) % numPartitions;         }     } }

      Description

      By adding custom partitioner we can give control over which output partition a key (/value) goes to. We can add keywords to language e.g.

      PARTITION BY UDF(...)

      or a similar syntax. UDF returns a number between 0 and n-1 where n is number of output partitions.

        Attachments

        1. CustomPartitioner.patch
          14 kB
          Aniket Mokashi
        2. CustomPartitionerTest.patch
          24 kB
          Aniket Mokashi
        3. CustomPartitionerFinale.patch
          24 kB
          Aniket Mokashi

          Issue Links

            Activity

              People

              • Assignee:
                aniket486 Aniket Mokashi
                Reporter:
                amirhyoussefi Amir Youssefi
              • Votes:
                2 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: