Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      This feature allows to specify Hadoop Partitioner for the following operations: GROUP/COGROUP, CROSS, DISTINCT, JOIN (except 'skewed' join). Partitioner controls the partitioning of the keys of the intermediate map-outputs. See http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Partitioner.html for more details.

      To use this feature you can add PARTITION BY clause to the appropriate operator:
      A = load 'input_data';
      B = group A by $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2;
      .....
      Here is the code for SimpleCustomPartitioner

      public class SimpleCustomPartitioner extends Partitioner<PigNullableWritable, Writable> {
           //@Override
          public int getPartition(PigNullableWritable key, Writable value, int numPartitions) {
              if(key.getValueAsPigType() instanceof Integer) {
                  int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions);
                  return ret;
             }
             else {
                  return (key.hashCode()) % numPartitions;
              }
          }
      }
      Show
      This feature allows to specify Hadoop Partitioner for the following operations: GROUP/COGROUP, CROSS, DISTINCT, JOIN (except 'skewed' join). Partitioner controls the partitioning of the keys of the intermediate map-outputs. See http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Partitioner.html for more details. To use this feature you can add PARTITION BY clause to the appropriate operator: A = load 'input_data'; B = group A by $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2; ..... Here is the code for SimpleCustomPartitioner public class SimpleCustomPartitioner extends Partitioner<PigNullableWritable, Writable> {      //@Override     public int getPartition(PigNullableWritable key, Writable value, int numPartitions) {         if(key.getValueAsPigType() instanceof Integer) {             int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions);             return ret;        }        else {             return (key.hashCode()) % numPartitions;         }     } }

      Description

      By adding custom partitioner we can give control over which output partition a key (/value) goes to. We can add keywords to language e.g.

      PARTITION BY UDF(...)

      or a similar syntax. UDF returns a number between 0 and n-1 where n is number of output partitions.

      1. CustomPartitionerFinale.patch
        24 kB
        Aniket Mokashi
      2. CustomPartitionerTest.patch
        24 kB
        Aniket Mokashi
      3. CustomPartitioner.patch
        14 kB
        Aniket Mokashi

        Issue Links

          Activity

          Hide
          Daniel Dai added a comment -

          Can you try to declare data type in your load statement?
          A = load 'input_data' as (a0:int, a1:chararray......);

          Show
          Daniel Dai added a comment - Can you try to declare data type in your load statement? A = load 'input_data' as (a0:int, a1:chararray......);
          Hide
          Colonel.Hou added a comment -

          Daniel Dai, thank you for your answer, all exception is below:
          Backend error message
          ---------------------
          Error: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
          at com.compcc.ps.pig.TotalSortPartitioner.getPartition(TotalSortPartitioner.java:21)
          at com.compcc.ps.pig.TotalSortPartitioner.getPartition(TotalSortPartitioner.java:12)
          at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
          at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
          at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
          at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:135
          )
          at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
          at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
          at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
          at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
          at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:415)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
          at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

          My write partition is :
          public class TotalSortPartitioner extends Partitioner<PigNullableWritable, Writable>
          {
          public static final Log log = LogFactory.getLog(TotalSortPartitioner.class);
          public static final int ONE_HOUR_MINUTE = 60;

          public int getPartition(PigNullableWritable key, Writable value, int numPartitions)
          {
          int minute = -1;
          try

          { minute = Integer.parseInt(key.getValueAsPigType().toString()); }

          catch (RuntimeException e)

          { log.error("Convert String minute to Integer error. " + e.getMessage()); return minute; }

          ...
          return xxx;
          }
          }

          Show
          Colonel.Hou added a comment - Daniel Dai, thank you for your answer, all exception is below: Backend error message --------------------- Error: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer at com.compcc.ps.pig.TotalSortPartitioner.getPartition(TotalSortPartitioner.java:21) at com.compcc.ps.pig.TotalSortPartitioner.getPartition(TotalSortPartitioner.java:12) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:135 ) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) My write partition is : public class TotalSortPartitioner extends Partitioner<PigNullableWritable, Writable> { public static final Log log = LogFactory.getLog(TotalSortPartitioner.class); public static final int ONE_HOUR_MINUTE = 60; public int getPartition(PigNullableWritable key, Writable value, int numPartitions) { int minute = -1; try { minute = Integer.parseInt(key.getValueAsPigType().toString()); } catch (RuntimeException e) { log.error("Convert String minute to Integer error. " + e.getMessage()); return minute; } ... return xxx; } }
          Hide
          Colonel.Hou added a comment -

          Daniel Dai, thank you for your answer, all exception is below:
          Backend error message
          ---------------------
          Error: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
          at com.compcc.ps.pig.TotalSortPartitioner.getPartition(TotalSortPartitioner.java:21)
          at com.compcc.ps.pig.TotalSortPartitioner.getPartition(TotalSortPartitioner.java:12)
          at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
          at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
          at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
          at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:135
          )
          at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
          at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
          at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
          at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
          at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:415)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
          at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

          My write partition is :
          public class TotalSortPartitioner extends Partitioner<PigNullableWritable, Writable>
          {
          public static final Log log = LogFactory.getLog(TotalSortPartitioner.class);
          public static final int ONE_HOUR_MINUTE = 60;

          public int getPartition(PigNullableWritable key, Writable value, int numPartitions)
          {
          int minute = -1;
          try

          { minute = Integer.parseInt(key.getValueAsPigType().toString()); }

          catch (RuntimeException e)

          { log.error("Convert String minute to Integer error. " + e.getMessage()); return minute; }

          ...
          return xxx;
          }
          }

          Show
          Colonel.Hou added a comment - Daniel Dai, thank you for your answer, all exception is below: Backend error message --------------------- Error: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer at com.compcc.ps.pig.TotalSortPartitioner.getPartition(TotalSortPartitioner.java:21) at com.compcc.ps.pig.TotalSortPartitioner.getPartition(TotalSortPartitioner.java:12) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:135 ) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) My write partition is : public class TotalSortPartitioner extends Partitioner<PigNullableWritable, Writable> { public static final Log log = LogFactory.getLog(TotalSortPartitioner.class); public static final int ONE_HOUR_MINUTE = 60; public int getPartition(PigNullableWritable key, Writable value, int numPartitions) { int minute = -1; try { minute = Integer.parseInt(key.getValueAsPigType().toString()); } catch (RuntimeException e) { log.error("Convert String minute to Integer error. " + e.getMessage()); return minute; } ... return xxx; } }
          Hide
          Colonel.Hou added a comment -

          Daniel Dai, thank you for your answer, all exception is below:
          Backend error message
          ---------------------
          Error: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
          at com.compcc.ps.pig.TotalSortPartitioner.getPartition(TotalSortPartitioner.java:21)
          at com.compcc.ps.pig.TotalSortPartitioner.getPartition(TotalSortPartitioner.java:12)
          at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
          at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
          at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
          at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:135
          )
          at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
          at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
          at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
          at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
          at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:415)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
          at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

          My write partition is :
          public class TotalSortPartitioner extends Partitioner<PigNullableWritable, Writable>
          {
          public static final Log log = LogFactory.getLog(TotalSortPartitioner.class);
          public static final int ONE_HOUR_MINUTE = 60;

          public int getPartition(PigNullableWritable key, Writable value, int numPartitions)
          {
          int minute = -1;
          try

          { minute = Integer.parseInt(key.getValueAsPigType().toString()); }

          catch (RuntimeException e)

          { log.error("Convert String minute to Integer error. " + e.getMessage()); return minute; }

          ...
          return xxx;
          }
          }

          Show
          Colonel.Hou added a comment - Daniel Dai, thank you for your answer, all exception is below: Backend error message --------------------- Error: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer at com.compcc.ps.pig.TotalSortPartitioner.getPartition(TotalSortPartitioner.java:21) at com.compcc.ps.pig.TotalSortPartitioner.getPartition(TotalSortPartitioner.java:12) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:135 ) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) My write partition is : public class TotalSortPartitioner extends Partitioner<PigNullableWritable, Writable> { public static final Log log = LogFactory.getLog(TotalSortPartitioner.class); public static final int ONE_HOUR_MINUTE = 60; public int getPartition(PigNullableWritable key, Writable value, int numPartitions) { int minute = -1; try { minute = Integer.parseInt(key.getValueAsPigType().toString()); } catch (RuntimeException e) { log.error("Convert String minute to Integer error. " + e.getMessage()); return minute; } ... return xxx; } }
          Hide
          Daniel Dai added a comment -

          Colonel.Hou, do you have complete stack for the exception?

          Show
          Daniel Dai added a comment - Colonel.Hou , do you have complete stack for the exception?
          Hide
          Colonel.Hou added a comment -

          why i custom partition, it throw ClassCastException, org.apache.pig.impl.io.NullableTuple cannot be cast to org.apache.hadoop.io.Text?

          the partition input datetype is long(mydt AS sT:long),My partition is : class xxx extends Partitioner<NullableLongWritable, Text> {,
          GROUP theLast BY sT PARTITION BY com.test.partition.MySortPartitioner PARALLEL 12;

          who tell me the why ? very tks

          Show
          Colonel.Hou added a comment - why i custom partition, it throw ClassCastException, org.apache.pig.impl.io.NullableTuple cannot be cast to org.apache.hadoop.io.Text? the partition input datetype is long(mydt AS sT:long),My partition is : class xxx extends Partitioner<NullableLongWritable, Text> {, GROUP theLast BY sT PARTITION BY com.test.partition.MySortPartitioner PARALLEL 12; who tell me the why ? very tks
          Hide
          Dmitriy V. Ryaboy added a comment -

          Brad, ORDER produces a total order out of the box.

          Show
          Dmitriy V. Ryaboy added a comment - Brad, ORDER produces a total order out of the box.
          Hide
          Brad Tofel added a comment -

          Do I read this right - there is no way to specify a custom partitioner for use with "ORDER BY"?

          If so, is there any other way to perform a total ordering within Pig?

          I will be doing a STORE immediately after the ORDER - the relation will not be used again. Is there some other work around to achieve this?

          I would love to replace my current Hadoop Java code with Pig, but total ordering is a requirement.

          Show
          Brad Tofel added a comment - Do I read this right - there is no way to specify a custom partitioner for use with "ORDER BY"? If so, is there any other way to perform a total ordering within Pig? I will be doing a STORE immediately after the ORDER - the relation will not be used again. Is there some other work around to achieve this? I would love to replace my current Hadoop Java code with Pig, but total ordering is a requirement.
          Hide
          Daniel Dai added a comment -

          Manual test pass. Release audit warning is due to one additional jdiff artifacts. Patch committed, thanks Aniket!

          Show
          Daniel Dai added a comment - Manual test pass. Release audit warning is due to one additional jdiff artifacts. Patch committed, thanks Aniket!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12446172/CustomPartitionerFinale.patch
          against trunk revision 951229.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 380 release audit warnings (more than the trunk's current 379 warnings).

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/320/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/320/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/320/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/320/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446172/CustomPartitionerFinale.patch against trunk revision 951229. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 380 release audit warnings (more than the trunk's current 379 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/320/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/320/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/320/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/320/console This message is automatically generated.
          Hide
          Aniket Mokashi added a comment -

          Added code review comments and some minor changes with test cases.

          Show
          Aniket Mokashi added a comment - Added code review comments and some minor changes with test cases.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12446067/CustomPartitionerTest.patch
          against trunk revision 949057.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 386 release audit warnings (more than the trunk's current 385 warnings).

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h1.grid.sp2.yahoo.net/18/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h1.grid.sp2.yahoo.net/18/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h1.grid.sp2.yahoo.net/18/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h1.grid.sp2.yahoo.net/18/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446067/CustomPartitionerTest.patch against trunk revision 949057. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 386 release audit warnings (more than the trunk's current 385 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h1.grid.sp2.yahoo.net/18/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h1.grid.sp2.yahoo.net/18/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h1.grid.sp2.yahoo.net/18/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h1.grid.sp2.yahoo.net/18/console This message is automatically generated.
          Hide
          Aniket Mokashi added a comment -

          Adding test cases and some small fixes.

          Show
          Aniket Mokashi added a comment - Adding test cases and some small fixes.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12445704/CustomPartitioner.patch
          against trunk revision 949057.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 385 release audit warnings (more than the trunk's current 384 warnings).

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h1.grid.sp2.yahoo.net/13/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h1.grid.sp2.yahoo.net/13/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h1.grid.sp2.yahoo.net/13/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h1.grid.sp2.yahoo.net/13/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12445704/CustomPartitioner.patch against trunk revision 949057. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 385 release audit warnings (more than the trunk's current 384 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h1.grid.sp2.yahoo.net/13/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h1.grid.sp2.yahoo.net/13/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h1.grid.sp2.yahoo.net/13/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h1.grid.sp2.yahoo.net/13/console This message is automatically generated.
          Hide
          Aniket Mokashi added a comment -

          Initial changes

          Show
          Aniket Mokashi added a comment - Initial changes
          Hide
          Aniket Mokashi added a comment -

          Initial Changes

          Show
          Aniket Mokashi added a comment - Initial Changes
          Hide
          Aniket Mokashi added a comment -

          1. It is suitable to have PARTITION BY mapreduce.Partitioner than UDF. This will be followed by PARALLEL n.
          2. Applicable to-
          GROUP
          COGROUP
          CROSS
          DISTINCT
          JOIN (except 'skewed' which uses SkewedPartitioner)
          3. ORDER partition by - not supported.
          4. No check for validation of custom partitioners parameters (<PigNullableWritable, Writable>).

          Approach-
          1. Added support for ClassType parsing and validation. Parsing for "partition by" is added to above mentioned clauses separately.
          2. Custom Partitioner is stored as a String in LO, PO and MR plan. LogicalOperator holds the partitioner in LO plan. We add partitioner to POGlobalRearrangement as it decides the map-reduce boundary. We read and set the partitioner when we visit the POGlobalRearrangement.

          Attaching a patch with initial changes...

          Show
          Aniket Mokashi added a comment - 1. It is suitable to have PARTITION BY mapreduce.Partitioner than UDF. This will be followed by PARALLEL n. 2. Applicable to- GROUP COGROUP CROSS DISTINCT JOIN (except 'skewed' which uses SkewedPartitioner) 3. ORDER partition by - not supported. 4. No check for validation of custom partitioners parameters (<PigNullableWritable, Writable>). Approach- 1. Added support for ClassType parsing and validation. Parsing for "partition by" is added to above mentioned clauses separately. 2. Custom Partitioner is stored as a String in LO, PO and MR plan. LogicalOperator holds the partitioner in LO plan. We add partitioner to POGlobalRearrangement as it decides the map-reduce boundary. We read and set the partitioner when we visit the POGlobalRearrangement. Attaching a patch with initial changes...
          Hide
          Dmitriy V. Ryaboy added a comment -

          David,
          take a look at https://issues.apache.org/jira/browse/PIG-958 (it's in 0.6)

          Show
          Dmitriy V. Ryaboy added a comment - David, take a look at https://issues.apache.org/jira/browse/PIG-958 (it's in 0.6)
          Hide
          Alan Gates added a comment -

          This JIRA refers to map->reduce partitioning. Output partitioning of spraying to directories based on a key can be done now via a custom store function.

          Show
          Alan Gates added a comment - This JIRA refers to map->reduce partitioning. Output partitioning of spraying to directories based on a key can be done now via a custom store function.
          Hide
          David Ciemiewicz added a comment -

          How will the custom partitioner be used in Pig?

          Is this for map partitioning and/or output partitioning?

          For instance, I'd love to have something that created separate directories based on the value of some key.

          Show
          David Ciemiewicz added a comment - How will the custom partitioner be used in Pig? Is this for map partitioning and/or output partitioning? For instance, I'd love to have something that created separate directories based on the value of some key.
          Hide
          Yiping Han added a comment -

          Any concerns on this issue?

          Show
          Yiping Han added a comment - Any concerns on this issue?

            People

            • Assignee:
              Aniket Mokashi
              Reporter:
              Amir Youssefi
            • Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development