[KYLIN-4762] Optimize join where there is the same shardby partition num on join key - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Reopened
Priority: Minor
Resolution: Unresolved
Affects Version/s: v4.0.0-alpha
Fix Version/s: v4.1.0
Component/s: Query Engine
Labels:
None

Description

Optimize join by reducing shuffle when there is the same shard by partition number on join key.

When execute this sql,

// code placeholder
select m.seller_id, m.part_dt, sum(m.price) as s 
from kylin_sales m 
left join (
  select m1.part_dt as pd, count(distinct m1.SELLER_ID) as m1, count(1) as m2  
  from kylin_sales m1
  where m1.part_dt = '2012-01-05'
  group by m1.part_dt 
  ) j 
  on m.part_dt = j.pd
  where m.lstg_format_name = 'FP-GTC' 
  and m.part_dt = '2012-01-05'
  group by m.seller_id, m.part_dt limit 100;

the execution plan is shown below:

But the join key part_dt has the same shard by partition number, it can be optimized to reduce shuffle, similar to bucket join.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

shardby_join.png
18/Sep/20 01:02
65 kB
Zhichao Zhang

Activity

People

Assignee:: Zhichao Zhang

Reporter:: Zhichao Zhang

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 18/Sep/20 01:02

Updated:: 31/Aug/22 12:18