Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 2.5.0
Description
Today the planner creates left deep trees where joins are ordered based on selectivity.
The proposal is to add a rule in the planner that traverses the plan after createSingleNodePlan and flip the build and probe sides if the cardinality suggests so, which produces a bushy plan.
This optimization should improve queries against normalized schemas with selective joins and multiple fact tables, more "efficient" runtime filters will be created as a result of the plan change
The query below generates a left deep plan, where are a bush plan should be created
query
select count(*) from
orders,supplier, lineitem
where
l_orderkey = o_orderkey and
s_suppkey = l_suppkey and
s_name="Supplier#000000001";
Plan
10:AGGREGATE [FINALIZE]
| output: count:merge(*)
| hosts=15 per-host-mem=unavailable
| tuple-ids=3 row-size=8B cardinality=1
|
09:EXCHANGE [UNPARTITIONED]
| hosts=15 per-host-mem=unavailable
| tuple-ids=3 row-size=8B cardinality=1
|
05:AGGREGATE
| output: count(*)
| hosts=15 per-host-mem=10.00MB
| tuple-ids=3 row-size=8B cardinality=1
|
04:HASH JOIN [INNER JOIN, PARTITIONED]
| hash predicates: l_orderkey = o_orderkey
| runtime filters: RF000 <- o_orderkey
| hosts=15 per-host-mem=251.77MB
| tuple-ids=2,1,0 row-size=66B cardinality=600
|
|--08:EXCHANGE [HASH(o_orderkey)]
| | hosts=15 per-host-mem=0B
| | tuple-ids=0 row-size=8B cardinality=450000000
| |
| 00:SCAN HDFS [tpch_300_parquet.orders, RANDOM]
| partitions=1/1 files=81 size=18.82GB
| table stats: 450000000 rows total
| column stats: all
| hosts=15 per-host-mem=88.00MB
| tuple-ids=0 row-size=8B cardinality=450000000
|
07:EXCHANGE [HASH(l_orderkey)]
| hosts=15 per-host-mem=0B
| tuple-ids=2,1 row-size=58B cardinality=600
|
03:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: l_suppkey = s_suppkey
| runtime filters: RF001 <- s_suppkey
| hosts=15 per-host-mem=47B
| tuple-ids=2,1 row-size=58B cardinality=600
|
|--06:EXCHANGE [BROADCAST]
| | hosts=9 per-host-mem=0B
| | tuple-ids=1 row-size=42B cardinality=1
| |
| 01:SCAN HDFS [tpch_300_parquet.supplier, RANDOM]
| partitions=1/1 files=9 size=233.61MB
| predicates: s_name = 'Supplier#000000001'
| table stats: 3000000 rows total
| column stats: all
| hosts=9 per-host-mem=80.00MB
| tuple-ids=1 row-size=42B cardinality=1
|
02:SCAN HDFS [tpch_300_parquet.lineitem, RANDOM]
partitions=1/1 files=264 size=64.36GB
runtime filters: RF000 -> l_orderkey, RF001 -> l_suppkey
table stats: 1799989091 rows total
column stats: all
hosts=15 per-host-mem=176.00MB
tuple-ids=2 row-size=16B cardinality=1799989091