Description
Query
select count(*) from store_sales ,store_returns ,date_dim d1 ,date_dim d2 where d1.d_quarter_name = '2000Q1' and d1.d_date_sk = ss_sold_date_sk and ss_customer_sk = sr_customer_sk and ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number and sr_returned_date_sk = d2.d_date_sk and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3’);
The store_sales table is partitioned on ss_sold_date_sk, which is also used in a join clause. The join clause should add a filter “filterExpr: ss_sold_date_sk is not null”, which should get pushed the MetaStore when fetching the stats. Currently this is not done in CBO planning, which results in the stats from _HIVE_DEFAULT_PARTITION_ to be fetched and considered in the optimization phase. In particular, this increases the NDV for the join columns and may result in wrong planning.
Including HiveJoinAddNotNullRule in the optimization phase solves this issue.
Attachments
Attachments
Issue Links
- blocks
-
HIVE-11865 Disable Hive PPD optimizer when CBO has optimized the plan
- Closed
- is related to
-
HIVE-12478 Improve Hive/Calcite Transitive Predicate inference
- Closed
- relates to
-
HIVE-11764 Verify the correctness of groupby_cube1.q with MR, Tez and Spark Mode with HIVE-11110
- Open
-
HIVE-11918 Implement/Enable constant related optimization rules in Calcite
- Resolved
- requires
-
HIVE-11151 Calcite transitive predicate inference rule should not transitively add not null filter on non-nullable input
- Closed
-
HIVE-11152 Swapping join inputs in ASTConverter
- Closed
- links to