Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4762

Optimize join where there is the same shardby partition num on join key

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Reopened
    • Minor
    • Resolution: Unresolved
    • v4.0.0-alpha
    • v4.1.0
    • Query Engine
    • None

    Description

      Optimize join by reducing shuffle when there is the same shard by partition number on join key.

      When execute this sql,

      // code placeholder
      select m.seller_id, m.part_dt, sum(m.price) as s 
      from kylin_sales m 
      left join (
        select m1.part_dt as pd, count(distinct m1.SELLER_ID) as m1, count(1) as m2  
        from kylin_sales m1
        where m1.part_dt = '2012-01-05'
        group by m1.part_dt 
        ) j 
        on m.part_dt = j.pd
        where m.lstg_format_name = 'FP-GTC' 
        and m.part_dt = '2012-01-05'
        group by m.seller_id, m.part_dt limit 100;
      

      the execution plan is shown below:

      But the join key part_dt has the same shard by partition number, it can be optimized to reduce shuffle, similar to bucket join.

      Attachments

        1. shardby_join.png
          65 kB
          Zhichao Zhang

        Activity

          People

            zzcclp Zhichao Zhang
            zzcclp Zhichao Zhang
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: