[PIG-5029] Optimize sort case when data is skewed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Patch Available
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: spark-branch
Component/s: spark
Labels:
None

Description

In PigMix L9.pig

register $PIGMIX_JAR
A = load '$HDFS_ROOT/page_views' using org.apache.pig.test.pigmix.udf.PigPerformanceLoader()
    as (user, action, timespent, query_term, ip_addr, timestamp,
        estimated_revenue, page_info, page_links);
B = order A by query_term parallel $PARALLEL;
store B into '$PIGMIX_OUTPUT/L9out';

The pig physical plan will be changed to spark plan and to spark lineage:

[main] 2016-09-08 01:49:09,844 DEBUG converter.StoreConverter (StoreConverter.java:convert(110)) - RDD lineage: (23) MapPartitionsRDD[8] at map at StoreConverter.java:80 []
 |   MapPartitionsRDD[7] at mapPartitions at SortConverter.java:58 []
 |   ShuffledRDD[6] at sortByKey at SortConverter.java:56 []
 +-(23) MapPartitionsRDD[3] at map at SortConverter.java:49 []
    |   MapPartitionsRDD[2] at mapPartitions at ForEachConverter.java:64 []
    |   MapPartitionsRDD[1] at map at LoadConverter.java:127 []
    |   NewHadoopRDD[0] at newAPIHadoopRDD at LoadConverter.java:102 []

We use sortByKey to implement the sort feature. Although RangePartitioner is used by RDD.sortByKey and RangePartitiner will sample data and ranges the key roughly into equal range, the test result(attached document) shows that one partition will load most keys and take long time to finish.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SkewedData_L9.docx
12/Sep/16 02:49
1.10 MB
liyunzhang
PIG-5051_5029_5.patch
01/Nov/16 09:01
11 kB
liyunzhang
PIG-5029.patch
13/Sep/16 07:18
8 kB
liyunzhang
PIG-5029_3.patch
26/Oct/16 06:04
51 kB
liyunzhang
PIG-5029_2.patch
28/Sep/16 08:50
10 kB
liyunzhang

Activity

People

Assignee:: liyunzhang

Reporter:: liyunzhang

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 12/Sep/16 01:50

Updated:: 23/Nov/16 07:28