Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
spark-branch
-
None
Description
Shuffle operations like DISTINCT, GROUP, JOIN, CROSS allow custom MR partitioners to be specified.
Example:
B = GROUP A BY $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner PARALLEL 2; public class SimpleCustomPartitioner extends Partitioner <PigNullableWritable, Writable> { //@Override public int getPartition(PigNullableWritable key, Writable value, int numPartitions) { if(key.getValueAsPigType() instanceof Integer) { int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions); return ret; } else { return (key.hashCode()) % numPartitions; } } }
Since Spark's shuffle APIs takes a different parititioner class (org.apache.spark.Partitioner) compared to MapReduce (org.apache.hadoop.mapreduce.Partitioner), we need to wrap custom partitioners written for MapReduce inside a Spark Partitioner.
Attachments
Attachments
Issue Links
- links to