Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27280

infer filters from Join's OR condition

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.1.0
    • None
    • Optimizer, SQL
    • None

    Description

      In some case, We can infer filters from Join condition with OR expressions.

      for example, tpc-ds query 48:

      select sum (ss_quantity)
       from store_sales, store, customer_demographics, customer_address, date_dim
       where s_store_sk = ss_store_sk
       and  ss_sold_date_sk = d_date_sk and d_year = 2000
       and  
       (
        (
         cd_demo_sk = ss_cdemo_sk
         and 
         cd_marital_status = 'S'
         and 
         cd_education_status = 'Secondary'
         and 
         ss_sales_price between 100.00 and 150.00  
         )
       or
        (
        cd_demo_sk = ss_cdemo_sk
         and 
         cd_marital_status = 'M'
         and 
         cd_education_status = 'College'
         and 
         ss_sales_price between 50.00 and 100.00   
        )
       or 
       (
        cd_demo_sk = ss_cdemo_sk
        and 
         cd_marital_status = 'U'
         and 
         cd_education_status = '2 yr Degree'
         and 
         ss_sales_price between 150.00 and 200.00  
       )
       )
       and
       (
        (
        ss_addr_sk = ca_address_sk
        and
        ca_country = 'United States'
        and
        ca_state in ('AL', 'OH', 'MD')
        and ss_net_profit between 0 and 2000  
        )
       or
        (ss_addr_sk = ca_address_sk
        and
        ca_country = 'United States'
        and
        ca_state in ('VA', 'TX', 'IA')
        and ss_net_profit between 150 and 3000 
        )
       or
        (ss_addr_sk = ca_address_sk
        and
        ca_country = 'United States'
        and
        ca_state in ('RI', 'WI', 'KY')
        and ss_net_profit between 50 and 25000 
        )
       )
      ;
      

      we can infer two filters from the join or condidtion:

      for customer_demographics:
      cd_marital_status in(‘D',‘U',‘M') and cd_education_status in('4 yr Degree’,’Secondary’,’Primary')
      
      for store_sales:
       (ss_sales_price between 100.00 and 150.00 or ss_sales_price between 50.00 and 100.00 or ss_sales_price between 150.00 and 200.00)
      

      then then we can push down the above two filters to filter customer_demographics/store_sales.

      A pr will be submit soon.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              windpiger Song Jun
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: