Details
-
Improvement
-
Status: Reopened
-
Minor
-
Resolution: Unresolved
-
v4.0.0-alpha
-
None
Description
Optimize join by reducing shuffle when there is the same shard by partition number on join key.
When execute this sql,
// code placeholder select m.seller_id, m.part_dt, sum(m.price) as s from kylin_sales m left join ( select m1.part_dt as pd, count(distinct m1.SELLER_ID) as m1, count(1) as m2 from kylin_sales m1 where m1.part_dt = '2012-01-05' group by m1.part_dt ) j on m.part_dt = j.pd where m.lstg_format_name = 'FP-GTC' and m.part_dt = '2012-01-05' group by m.seller_id, m.part_dt limit 100;
the execution plan is shown below:
But the join key part_dt has the same shard by partition number, it can be optimized to reduce shuffle, similar to bucket join.