Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3686

Planner : Implement aggregate push down rule

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • Impala 2.5.0
    • None
    • Frontend

    Description

      Join followed by Group by on the same key is a recurring pattern which the Planner can optimize by pushing down a partial aggregate below the join to reduce the over cardinality.

      TPC-DS query 23 is on of several queries that can benefit from such optimization

      The pattern looks like the following

      select 
              c_customer_sk, sum(ss_quantity * ss_sales_price) csales
          from
              store_sales, customer, date_dim
          where
              ss_customer_sk = c_customer_sk
                  and ss_sold_date_sk = d_date_sk
                  and d_year in (1999 , 1999 + 1, 1999 + 2, 1999 + 3)
          group by c_customer_sk
      

      A rewritten version of the query produces a 2x speedup since the NDV for ss_customer_sk is much smaller than number of rows in store_sales.

      with ss as 
      (select ss_customer_sk,sum(ss_quantity * ss_sales_price) csales
      from
          store_sales,
          date_dim
      where ss_sold_date_sk = d_date_sk
      and d_year in (1999 , 1999 + 1, 1999 + 2, 1999 + 3)
      group by ss_customer_sk
      )
      select count(*) from (select 
          c_customer_sk, sum(csales) csales
      from
          ss,
          customer
      where
          ss_customer_sk = c_customer_sk        
      group by c_customer_sk)a;
      

      Attachments

        1. query23.sql.2.out
          1.25 MB
          Mostafa Mokhtar

        Activity

          People

            Unassigned Unassigned
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: