Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
If we take a simple query like such:
select avg(ss_quantity) from store_sales join household_demographics on (ss_hdemo_sk=hd_demo_sk) where (ss_sales_price between 0 and 100 and hd_dep_count = 1) or (ss_sales_price between 100 and 200 and hd_dep_count = 2);
and look at the plan we see that neither table scan has any predicates pushed to it, the only filter is in the join
(from impalad version 2.12.0-cdh5.16.x RELEASE (build 3f68649c7bf8a01fb6ba0cbe35dd2492adb836dd)
PLAN-ROOT SINK | 06:AGGREGATE [FINALIZE] | output: avg:merge(ss_quantity) | 05:EXCHANGE [UNPARTITIONED] | 03:AGGREGATE | output: avg(ss_quantity) | 02:HASH JOIN [INNER JOIN, BROADCAST] | hash predicates: ss_hdemo_sk = hd_demo_sk | other predicates: (ss_sales_price >= 0 AND ss_sales_price <= 100 AND hd_dep_count = 1) OR | (ss_sales_price >= 100 AND ss_sales_price <= 200 AND hd_dep_count = 2) | runtime filters: RF000 <- hd_demo_sk | |--04:EXCHANGE [BROADCAST] | | | 01:SCAN HDFS [tpcds_1000_parquet.household_demographics] | partitions=1/1 files=1 size=41.08KB | 00:SCAN HDFS [tpcds_1000_parquet.store_sales] partitions=1824/1824 files=1824 size=189.24GB runtime filters: RF000 -> ss_hdemo_sk
If we look at PostgreSQL 11.1, we'll see that not only does the join filter, but the table scan has the appropriate filters pushed to it.
Finalize Aggregate (cost=67549.69..67549.70 rows=1 width=32) -> Gather (cost=67549.47..67549.68 rows=2 width=32) Workers Planned: 2 -> Partial Aggregate (cost=66549.47..66549.48 rows=1 width=32) -> Hash Join (cost=113.12..66547.64 rows=734 width=4) Hash Cond: (store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk) Join Filter: (((store_sales.ss_sales_price >= '0'::numeric) AND (store_sales.ss_sales_price <= '100'::numeric) AND (household_demographics.hd_dep_count = 1)) OR ((store_sales.ss_sales_price >= '100'::numeric) AND (store_sales.ss_sales_price <= '200'::numeric) AND (household_demographics.hd_dep_count = 2))) -> Parallel Seq Scan on store_sales (cost=0.00..66343.20 rows=7305 width=22) Filter: (((ss_sales_price >= '0'::numeric) AND (ss_sales_price <= '100'::numeric)) OR ((ss_sales_price >= '100'::numeric) AND (ss_sales_price <= '200'::numeric))) -> Hash (cost=112.62..112.62 rows=40 width=8) -> Seq Scan on household_demographics (cost=0.00..112.62 rows=40 width=8) Filter: ((hd_dep_count = 1) OR (hd_dep_count = 2))
Attachments
Issue Links
- is related to
-
IMPALA-9183 TPC-DS query 13 - customer_address predicates not propagated to scan
- Resolved
- relates to
-
IMPALA-4680 Join disjuncts are not simplified and pushed to the scan nodes when possible
- Resolved