Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      This feature allows to specify Hadoop Partitioner for the following operations: GROUP/COGROUP, CROSS, DISTINCT, JOIN (except 'skewed' join). Partitioner controls the partitioning of the keys of the intermediate map-outputs. See http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Partitioner.html for more details.

      To use this feature you can add PARTITION BY clause to the appropriate operator:
      A = load 'input_data';
      B = group A by $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2;
      .....
      Here is the code for SimpleCustomPartitioner

      public class SimpleCustomPartitioner extends Partitioner<PigNullableWritable, Writable> {
           //@Override
          public int getPartition(PigNullableWritable key, Writable value, int numPartitions) {
              if(key.getValueAsPigType() instanceof Integer) {
                  int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions);
                  return ret;
             }
             else {
                  return (key.hashCode()) % numPartitions;
              }
          }
      }
      Show
      This feature allows to specify Hadoop Partitioner for the following operations: GROUP/COGROUP, CROSS, DISTINCT, JOIN (except 'skewed' join). Partitioner controls the partitioning of the keys of the intermediate map-outputs. See http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Partitioner.html for more details. To use this feature you can add PARTITION BY clause to the appropriate operator: A = load 'input_data'; B = group A by $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2; ..... Here is the code for SimpleCustomPartitioner public class SimpleCustomPartitioner extends Partitioner<PigNullableWritable, Writable> {      //@Override     public int getPartition(PigNullableWritable key, Writable value, int numPartitions) {         if(key.getValueAsPigType() instanceof Integer) {             int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions);             return ret;        }        else {             return (key.hashCode()) % numPartitions;         }     } }

      Description

      By adding custom partitioner we can give control over which output partition a key (/value) goes to. We can add keywords to language e.g.

      PARTITION BY UDF(...)

      or a similar syntax. UDF returns a number between 0 and n-1 where n is number of output partitions.

      1. CustomPartitionerTest.patch
        24 kB
        Aniket Mokashi
      2. CustomPartitionerFinale.patch
        24 kB
        Aniket Mokashi
      3. CustomPartitioner.patch
        14 kB
        Aniket Mokashi

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Aniket Mokashi
              Reporter:
              Amir Youssefi
            • Votes:
              2 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development