Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-468

Introduce semi join reduction optimization in Calcite

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      The basic idea is to apply join predicates early in a plan in order to reduce the size of intermediate query results and, thus, reduce the cost of other operations. In other words, the idea is to apply the same join predicates twice or more often in a query plan

      In order to reduce the communication costs of a distributed system. Obviously, semi-join reducers are only effective if the (redundant) semi-joins are cheap and result in a significant reduction of the size of intermediate
      query results.

      I propose to extend a query optimizer and integrate semi-join reducer and
      join-ordering, etc. into a single query optimization step

      Several TPC-DS queries like 24, 64 & 80 run very slow do to the lake of semi join reduction optimization in Calcite.

      Doing a rewrite of Q64 to simulate semi join reduction produced 4x gains.

      Query	               	Total time	     CPU 	Intermediate rows (Million)
      Baseline			1,377	 356,900	                    23,940
      Semi Join Reduction		343	  47,253	                        23
      

      Q64 subset

      select 
          count(*)
      FROM
          store_sales
              JOIN
          item ON store_sales.ss_item_sk = item.i_item_sk
              JOIN
          store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk
              JOIN
          (select 
              cs_item_sk
          from
              catalog_sales
          JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
          group by cs_item_sk
          having 
      sum(cs_ext_list_price) > 2 * sum(cr_refunded_cash + cr_reversed_charge + cr_store_credit)) cs_ui 
      ON store_sales.ss_item_sk = cs_ui.cs_item_sk
      WHERE
          i_color in ('maroon' , 'burnished',
              'dim',
              'steel',
              'navajo',
              'chocolate')
              and i_current_price between 35 and 35 + 10
              and i_current_price between 35 + 1 and 35 + 15
      

        Attachments

        1. BaselineTree.png
          152 kB
          Mostafa Mokhtar
        2. SemiJoinReductionTreee.png
          154 kB
          Mostafa Mokhtar
        3. SemiJoinReductionGains.png
          101 kB
          Mostafa Mokhtar

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                mmokhtar Mostafa Mokhtar
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: