The store_sales table is partitioned on ss_sold_date_sk, which is also used in a join clause. The join clause should add a filter “filterExpr: ss_sold_date_sk is not null”, which should get pushed the MetaStore when fetching the stats. Currently this is not done in CBO planning, which results in the stats from _HIVE_DEFAULT_PARTITION_ to be fetched and considered in the optimization phase. In particular, this increases the NDV for the join columns and may result in wrong planning.
Including HiveJoinAddNotNullRule in the optimization phase solves this issue.