[PIG-4565] Support custom MR partitioners for Spark engine - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: spark-branch
Fix Version/s: spark-branch
Component/s: spark
Labels:
None

Description

Shuffle operations like DISTINCT, GROUP, JOIN, CROSS allow custom MR partitioners to be specified.

Example:

B = GROUP A BY $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner PARALLEL 2;

public class SimpleCustomPartitioner extends Partitioner <PigNullableWritable, Writable> { 
     //@Override 
    public int getPartition(PigNullableWritable key, Writable value, int numPartitions) { 
        if(key.getValueAsPigType() instanceof Integer) { 
            int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions); 
            return ret; 
       } 
       else { 
            return (key.hashCode()) % numPartitions; 
        } 
    } 
}

Since Spark's shuffle APIs takes a different parititioner class (org.apache.spark.Partitioner) compared to MapReduce (org.apache.hadoop.mapreduce.Partitioner), we need to wrap custom partitioners written for MapReduce inside a Spark Partitioner.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PIG-4565.3.patch
29/May/15 05:56
32 kB
Mohit Sabharwal
PIG-4565.2.patch
27/May/15 22:59
31 kB
Mohit Sabharwal
PIG-4565.1.patch
27/May/15 00:03
30 kB
Mohit Sabharwal
PIG-4565.patch
21/May/15 03:30
29 kB
Mohit Sabharwal

Issue Links

links to

review board

Activity

People

Assignee:: Mohit Sabharwal

Reporter:: Mohit Sabharwal

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 21/May/15 01:55

Updated:: 21/Jun/17 09:18

Resolved:: 29/May/15 13:16