Description
It appears that when inequality predicate is applied on a fact to dimension join the filter selectivity on the dimension table is not reflected on the join cardinality estimation.
This issue was first found in TPC-DS Q14 attached.
Query
explain select ss_quantity quantity from store_sales, date_dim where ss_sold_date_sk = d_date_sk and d_year < 2000
Plan
+----------------------------------------------------------+
| Explain String |
+----------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=88.06MB VCores=2 |
| |
| 04:EXCHANGE [UNPARTITIONED] |
| | hosts=8 per-host-mem=unavailable |
| | tuple-ids=0,1 row-size=16B cardinality=2879987999 |
| | |
| 02:HASH JOIN [INNER JOIN, BROADCAST] |
| | hash predicates: ss_sold_date_sk = d_date_sk |
| | runtime filters: RF000 <- d_date_sk |
| | hosts=8 per-host-mem=62.78KB |
| | tuple-ids=0,1 row-size=16B cardinality=2879987999 |
| | |
| |--03:EXCHANGE [BROADCAST] |
| | | hosts=1 per-host-mem=0B |
| | | tuple-ids=1 row-size=8B cardinality=7305 |
| | | |
| | 01:SCAN HDFS [tpcds_1000_parquet.date_dim, RANDOM] |
| | partitions=1/1 files=1 size=2.17MB |
| | predicates: d_year < 2000 |
| | table stats: 73049 rows total |
| | column stats: all |
| | hosts=1 per-host-mem=32.00MB |
| | tuple-ids=1 row-size=8B cardinality=7305 |
| | |
| 00:SCAN HDFS [tpcds_1000_parquet.store_sales, RANDOM] |
| partitions=1824/1824 files=1824 size=189.24GB |
| runtime filters: RF000 -> ss_sold_date_sk |
| table stats: 2879987999 rows total |
| column stats: all |
| hosts=8 per-host-mem=88.00MB |
| tuple-ids=0 row-size=8B cardinality=2879987999 |
+----------------------------------------------------------+
When an equality predicate is used the selectivity of the filter on the dimension table is reflected on the join
select ss_quantity from store_sales, date_dim where ss_sold_date_sk = d_date_sk and d_year = 1999;
+----------------------------------------------------------+
| Explain String |
+----------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=88.00MB VCores=2 |
| |
| 04:EXCHANGE [UNPARTITIONED] |
| | hosts=8 per-host-mem=unavailable |
| | tuple-ids=0,1 row-size=16B cardinality=588944915 |
| | |
| 02:HASH JOIN [INNER JOIN, BROADCAST] |
| | hash predicates: ss_sold_date_sk = d_date_sk |
| | runtime filters: RF000 <- d_date_sk |
| | hosts=8 per-host-mem=3.21KB |
| | tuple-ids=0,1 row-size=16B cardinality=588944915 |
| | |
| |--03:EXCHANGE [BROADCAST] |
| | | hosts=1 per-host-mem=0B |
| | | tuple-ids=1 row-size=8B cardinality=373 |
| | | |
| | 01:SCAN HDFS [tpcds_1000_parquet.date_dim, RANDOM] |
| | partitions=1/1 files=1 size=2.17MB |
| | predicates: d_year = 1999 |
| | table stats: 73049 rows total |
| | column stats: all |
| | hosts=1 per-host-mem=32.00MB |
| | tuple-ids=1 row-size=8B cardinality=373 |
| | |
| 00:SCAN HDFS [tpcds_1000_parquet.store_sales, RANDOM] |
| partitions=1824/1824 files=1824 size=189.24GB |
| runtime filters: RF000 -> ss_sold_date_sk |
| table stats: 2879987999 rows total |
| column stats: all |
| hosts=8 per-host-mem=88.00MB |
| tuple-ids=0 row-size=8B cardinality=2879987999 |
+----------------------------------------------------------+