Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-399

Factorize common AND factors out of OR predicates

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.1-incubating
    • Component/s: None
    • Labels:
      None

      Description

      If a predicate is of the from p AND q1 OR p AND q2 it would help to factorize out the AND factor p, to the equivalent expression p AND (q1 OR q2). This is useful when p contains predicates from just one side of a join and can be pushed down, whereas the whole predicate references columns from both sides.

      Converting to DNF is not sufficient, and in the worst case exponentially increases the size of the expression.

      In the following query, factorization will allow us to push down store.s_store_sk = store_sales.ss_store_sk AND store_sales.ss_sold_date_sk = date_dim.d_date_sk.

      SELECT avg(ss_quantity) ,
             avg(ss_ext_sales_price) ,
             avg(ss_ext_wholesale_cost) ,
             sum(ss_ext_wholesale_cost)
      FROM store_sales ,
           store ,
           customer_demographics ,
           household_demographics ,
           customer_address ,
           date_dim
      WHERE store.s_store_sk = store_sales.ss_store_sk
        AND store_sales.ss_sold_date_sk = date_dim.d_date_sk
        AND date_dim.d_year = 2001
        AND((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
             AND customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
             AND customer_demographics.cd_marital_status = 'M'
             AND customer_demographics.cd_education_status = '4 yr Degree'
             AND store_sales.ss_sales_price BETWEEN 100.00 AND 150.00
             AND household_demographics.hd_dep_count = 3)
            OR (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
                AND customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
                AND customer_demographics.cd_marital_status = 'D'
                AND customer_demographics.cd_education_status = 'Primary'
                AND store_sales.ss_sales_price BETWEEN 50.00 AND 100.00
                AND household_demographics.hd_dep_count = 1 )
            OR (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
                AND customer_demographics.cd_demo_sk = ss_cdemo_sk
                AND customer_demographics.cd_marital_status = 'U'
                AND customer_demographics.cd_education_status = 'Advanced Degree'
                AND store_sales.ss_sales_price BETWEEN 150.00 AND 200.00
                AND household_demographics.hd_dep_count = 1))
        AND((store_sales.ss_addr_sk = customer_address.ca_address_sk
             AND customer_address.ca_country = 'United States'
             AND customer_address.ca_state IN ('KY',
                                               'GA',
                                               'NM')
             AND store_sales.ss_net_profit BETWEEN 100 AND 200)
            OR (store_sales.ss_addr_sk = customer_address.ca_address_sk
                AND customer_address.ca_country = 'United States'
                AND customer_address.ca_state IN ('MT',
                                                  'OR',
                                                  'IN')
                AND store_sales.ss_net_profit BETWEEN 150 AND 300)
            OR (store_sales.ss_addr_sk = customer_address.ca_address_sk
                AND customer_address.ca_country = 'United States'
                AND customer_address.ca_state IN ('WI',
                                                  'MO',
                                                  'WV')
                AND store_sales.ss_net_profit BETWEEN 50 AND 250)) ;
      

        Issue Links

          Activity

          Hide
          julianhyde Julian Hyde added a comment - - edited

          Related work: http://ilpubs.stanford.edu:8090/613/1/2003-56.pdf or http://delivery.acm.org/10.1145/880000/872802/p361-chaudhuri.pdf "Factorizing complex predicates in queries to exploit indexes"
          (Surajit Chaudhuri, Prasanna Ganesan, Sunita Sarawagi SIGMOD '03)

          Show
          julianhyde Julian Hyde added a comment - - edited Related work: http://ilpubs.stanford.edu:8090/613/1/2003-56.pdf or http://delivery.acm.org/10.1145/880000/872802/p361-chaudhuri.pdf "Factorizing complex predicates in queries to exploit indexes" (Surajit Chaudhuri, Prasanna Ganesan, Sunita Sarawagi SIGMOD '03)
          Hide
          julianhyde Julian Hyde added a comment -

          I have a tentative fix at https://github.com/julianhyde/optiq/tree/pullFactors. Let me know whether it solves your use case.

          Show
          julianhyde Julian Hyde added a comment - I have a tentative fix at https://github.com/julianhyde/optiq/tree/pullFactors . Let me know whether it solves your use case.
          Hide
          julianhyde Julian Hyde added a comment -

          Let me know whether the fix I have made in the branch works for you. Only then will I merge to master.

          Show
          julianhyde Julian Hyde added a comment - Let me know whether the fix I have made in the branch works for you. Only then will I merge to master.
          Hide
          jpullokkaran Laljo John Pullokkaran added a comment -

          Yes, it seems to work.
          Thanks

          Show
          jpullokkaran Laljo John Pullokkaran added a comment - Yes, it seems to work. Thanks
          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/incubator-optiq/commit/fd20d5c4 .

            People

            • Assignee:
              jpullokkaran Laljo John Pullokkaran
              Reporter:
              jpullokkaran Laljo John Pullokkaran
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development