Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      This feature allows to specify Hadoop Partitioner for the following operations: GROUP/COGROUP, CROSS, DISTINCT, JOIN (except 'skewed' join). Partitioner controls the partitioning of the keys of the intermediate map-outputs. See http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Partitioner.html for more details.

      To use this feature you can add PARTITION BY clause to the appropriate operator:
      A = load 'input_data';
      B = group A by $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2;
      .....
      Here is the code for SimpleCustomPartitioner

      public class SimpleCustomPartitioner extends Partitioner<PigNullableWritable, Writable> {
           //@Override
          public int getPartition(PigNullableWritable key, Writable value, int numPartitions) {
              if(key.getValueAsPigType() instanceof Integer) {
                  int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions);
                  return ret;
             }
             else {
                  return (key.hashCode()) % numPartitions;
              }
          }
      }
      Show
      This feature allows to specify Hadoop Partitioner for the following operations: GROUP/COGROUP, CROSS, DISTINCT, JOIN (except 'skewed' join). Partitioner controls the partitioning of the keys of the intermediate map-outputs. See http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Partitioner.html for more details. To use this feature you can add PARTITION BY clause to the appropriate operator: A = load 'input_data'; B = group A by $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2; ..... Here is the code for SimpleCustomPartitioner public class SimpleCustomPartitioner extends Partitioner<PigNullableWritable, Writable> {      //@Override     public int getPartition(PigNullableWritable key, Writable value, int numPartitions) {         if(key.getValueAsPigType() instanceof Integer) {             int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions);             return ret;        }        else {             return (key.hashCode()) % numPartitions;         }     } }

      Description

      By adding custom partitioner we can give control over which output partition a key (/value) goes to. We can add keywords to language e.g.

      PARTITION BY UDF(...)

      or a similar syntax. UDF returns a number between 0 and n-1 where n is number of output partitions.

      1. CustomPartitionerFinale.patch
        24 kB
        Aniket Mokashi
      2. CustomPartitionerTest.patch
        24 kB
        Aniket Mokashi
      3. CustomPartitioner.patch
        14 kB
        Aniket Mokashi

        Issue Links

          Activity

          Amir Youssefi created issue -
          Alan Gates made changes -
          Field Original Value New Value
          Link This issue is duplicated by PIG-478 [ PIG-478 ]
          Olga Natkovich made changes -
          Fix Version/s 0.8.0 [ 12314562 ]
          Daniel Dai made changes -
          Assignee Aniket Mokashi [ aniket486 ]
          Aniket Mokashi made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Release Note Initial changes
          Affects Version/s 0.7.0 [ 12314397 ]
          Aniket Mokashi made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Aniket Mokashi made changes -
          Attachment CustomPartitioner.patch [ 12445704 ]
          Aniket Mokashi made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Release Note Initial changes
          Aniket Mokashi made changes -
          Attachment CustomPartitionerTest.patch [ 12446067 ]
          Daniel Dai made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Daniel Dai made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Aniket Mokashi made changes -
          Attachment CustomPartitionerFinale.patch [ 12446172 ]
          Daniel Dai made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Daniel Dai made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Daniel Dai made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Olga Natkovich made changes -
          Release Note This feature allows to specify Hadoop Partitioner for the following operations: GROUP/COGROUP, CROSS, DISTINCT, JOIN (except 'skewed' join). Partitioner controls the partitioning of the keys of the intermediate map-outputs. See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Partitioner.html for more details.

          To use this feature you can add PARTITION BY clause to the appropriate operator:
          A = load 'input_data';
          B = group A by $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2;
          .....
          Here is the code for SimpleCustomPartitioner

          public class SimpleCustomPartitioner extends Partitioner<PigNullableWritable, Writable> {
               //@Override
              public int getPartition(PigNullableWritable key, Writable value, int numPartitions) {
                  if(key.getValueAsPigType() instanceof Integer) {
                      int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions);
                      return ret;
                 }
                 else {
                      return (key.hashCode()) % numPartitions;
                  }
              }
          }
          Viraj Bhat made changes -
          Release Note This feature allows to specify Hadoop Partitioner for the following operations: GROUP/COGROUP, CROSS, DISTINCT, JOIN (except 'skewed' join). Partitioner controls the partitioning of the keys of the intermediate map-outputs. See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Partitioner.html for more details.

          To use this feature you can add PARTITION BY clause to the appropriate operator:
          A = load 'input_data';
          B = group A by $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2;
          .....
          Here is the code for SimpleCustomPartitioner

          public class SimpleCustomPartitioner extends Partitioner<PigNullableWritable, Writable> {
               //@Override
              public int getPartition(PigNullableWritable key, Writable value, int numPartitions) {
                  if(key.getValueAsPigType() instanceof Integer) {
                      int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions);
                      return ret;
                 }
                 else {
                      return (key.hashCode()) % numPartitions;
                  }
              }
          }
          This feature allows to specify Hadoop Partitioner for the following operations: GROUP/COGROUP, CROSS, DISTINCT, JOIN (except 'skewed' join). Partitioner controls the partitioning of the keys of the intermediate map-outputs. See http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Partitioner.html for more details.

          To use this feature you can add PARTITION BY clause to the appropriate operator:
          A = load 'input_data';
          B = group A by $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2;
          .....
          Here is the code for SimpleCustomPartitioner

          public class SimpleCustomPartitioner extends Partitioner<PigNullableWritable, Writable> {
               //@Override
              public int getPartition(PigNullableWritable key, Writable value, int numPartitions) {
                  if(key.getValueAsPigType() instanceof Integer) {
                      int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions);
                      return ret;
                 }
                 else {
                      return (key.hashCode()) % numPartitions;
                  }
              }
          }
          Olga Natkovich made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Aniket Mokashi
              Reporter:
              Amir Youssefi
            • Votes:
              2 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development