Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 1.2.1
-
None
-
None
Description
When moving from Hive computed column stats to Impala computed column stats (more accurate) the plan for TPCH-Q21 changed slightly. With updated column stats the left outer join was changed from broadcast to partitioned (which I believe is okay), but the plan change also split the left outer join and semi-join into different plan fragments. This left semi join fragment is now partitioned on l1.l_orderkey and then re-partitioned on the same l1.l_orderkey (via STREAM DATA SINK) which is unnecessary/redundant.
Query: explain select s_name, count(*) as numwait from lineitem l1 join supplier s on (s.s_suppkey = l1.l_suppkey) join orders o on (o.o_orderkey = l1.l_orderkey) join nation n on (s.s_nationkey = n.n_nationkey) left semi join lineitem l2 on (l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey) left outer join lineitem l3 on (l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate) where l3.l_orderkey is null and n_name = 'SAUDI ARABIA' and o_orderstatus = 'F' group by s_name order by numwait desc, s_name limit 100 +-----------------------------------------------------------+ | Explain String | +-----------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=282.53MB VCores=4 | | | | PLAN FRAGMENT 0 | | PARTITION: UNPARTITIONED | | | | 23:TOP-N | | | order by: COUNT(*) DESC, s_name ASC | | | limit: 100 | | | cardinality: 100 | | | per-host memory: unavailable | | | tuple ids: 6 | | | | | 22:EXCHANGE | | cardinality: 100 | | per-host memory: unavailable | | tuple ids: 6 | | | | PLAN FRAGMENT 1 | | PARTITION: HASH_PARTITIONED: s_name | | | | STREAM DATA SINK | | EXCHANGE ID: 22 | | UNPARTITIONED | | | | 12:TOP-N | | | order by: COUNT(*) DESC, s_name ASC | | | limit: 100 | | | cardinality: 100 | | | per-host memory: 2.34KB | | | tuple ids: 6 | | | | | 21:AGGREGATE (merge finalize) | | | output: SUM(COUNT(*)) | | | group by: s_name | | | cardinality: 10388 | | | per-host memory: 10.00MB | | | tuple ids: 6 | | | | | 20:EXCHANGE | | cardinality: 20776 | | per-host memory: 0B | | tuple ids: 6 | | | | PLAN FRAGMENT 2 | | PARTITION: HASH_PARTITIONED: l1.l_orderkey | | | | STREAM DATA SINK | | EXCHANGE ID: 20 | | HASH_PARTITIONED: s_name | | | | 11:AGGREGATE | | | output: COUNT(*) | | | group by: s_name | | | cardinality: 10388 | | | per-host memory: 10.00MB | | | tuple ids: 6 | | | | | 10:HASH JOIN | | | join op: LEFT OUTER JOIN (PARTITIONED) | | | hash predicates: | | | l1.l_orderkey = l3.l_orderkey | | | other join predicates: l3.l_suppkey != l1.l_suppkey | | | other predicates: l3.l_orderkey IS NULL | | | cardinality: 126211 | | | per-host memory: 644.66KB | | | tuple ids: 0 1 2 3 4 5N | | | | | |----19:EXCHANGE | | | cardinality: 600122 | | | per-host memory: 0B | | | tuple ids: 5 | | | | | 18:EXCHANGE | | cardinality: 317788 | | per-host memory: 0B | | tuple ids: 0 1 2 3 4 | | | | PLAN FRAGMENT 3 | | PARTITION: RANDOM | | | | STREAM DATA SINK | | EXCHANGE ID: 19 | | HASH_PARTITIONED: l3.l_orderkey | | | | 9:SCAN HDFS | | table=tpch.lineitem #partitions=1/1 size=718.94MB | | predicates: l3.l_receiptdate > l3.l_commitdate | | table stats: 6001215 rows total | | column stats: all | | cardinality: 600122 | | per-host memory: 264.00MB | | tuple ids: 5 | | | | PLAN FRAGMENT 4 | | PARTITION: HASH_PARTITIONED: l1.l_orderkey | | | | STREAM DATA SINK | | EXCHANGE ID: 18 | | HASH_PARTITIONED: l1.l_orderkey | | | | 8:HASH JOIN | | | join op: LEFT SEMI JOIN (PARTITIONED) | | | hash predicates: | | | l1.l_orderkey = l2.l_orderkey | | | other predicates: l2.l_suppkey != l1.l_suppkey | | | cardinality: 317788 | | | per-host memory: 6.30MB | | | tuple ids: 0 1 2 3 4 | | | | | |----17:EXCHANGE | | | cardinality: 6001215 | | | per-host memory: 0B | | | tuple ids: 4 | | | | | 16:EXCHANGE | | cardinality: 80016 | | per-host memory: 0B | | tuple ids: 0 1 2 3 | | | | PLAN FRAGMENT 5 | | PARTITION: RANDOM | | | | STREAM DATA SINK | | EXCHANGE ID: 17 | | HASH_PARTITIONED: l2.l_orderkey | | | | 7:SCAN HDFS | | table=tpch.lineitem #partitions=1/1 size=718.94MB | | table stats: 6001215 rows total | | column stats: all | | cardinality: 6001215 | | per-host memory: 264.00MB | | tuple ids: 4 | | | | PLAN FRAGMENT 6 | | PARTITION: RANDOM | | | | STREAM DATA SINK | | EXCHANGE ID: 16 | | HASH_PARTITIONED: l1.l_orderkey | | | | 6:HASH JOIN | | | join op: INNER JOIN (BROADCAST) | | | hash predicates: | | | s.s_nationkey = n.n_nationkey | | | cardinality: 80016 | | | per-host memory: 4B | | | tuple ids: 0 1 2 3 | | | | | |----15:EXCHANGE | | | cardinality: 1 | | | per-host memory: 0B | | | tuple ids: 3 | | | | | 4:HASH JOIN | | | join op: INNER JOIN (BROADCAST) | | | hash predicates: | | | l1.l_orderkey = o.o_orderkey | | | cardinality: 2000405 | | | per-host memory: 1.57MB | | | tuple ids: 0 1 2 | | | | | |----14:EXCHANGE | | | cardinality: 500000 | | | per-host memory: 0B | | | tuple ids: 2 | | | | | 2:HASH JOIN | | | join op: INNER JOIN (BROADCAST) | | | hash predicates: | | | l1.l_suppkey = s.s_suppkey | | | cardinality: 6001215 | | | per-host memory: 32.23KB | | | tuple ids: 0 1 | | | | | |----13:EXCHANGE | | | cardinality: 10000 | | | per-host memory: 0B | | | tuple ids: 1 | | | | | 0:SCAN HDFS | | table=tpch.lineitem #partitions=1/1 size=718.94MB | | table stats: 6001215 rows total | | column stats: all | | cardinality: 6001215 | | per-host memory: 264.00MB | | tuple ids: 0 | | | | PLAN FRAGMENT 7 | | PARTITION: RANDOM | | | | STREAM DATA SINK | | EXCHANGE ID: 15 | | UNPARTITIONED | | | | 5:SCAN HDFS | | table=tpch.nation #partitions=1/1 size=2.15KB | | predicates: n_name = 'SAUDI ARABIA' | | table stats: 25 rows total | | column stats: all | | cardinality: 1 | | per-host memory: 32.00MB | | tuple ids: 3 | | | | PLAN FRAGMENT 8 | | PARTITION: RANDOM | | | | STREAM DATA SINK | | EXCHANGE ID: 14 | | UNPARTITIONED | | | | 3:SCAN HDFS | | table=tpch.orders #partitions=1/1 size=162.56MB | | predicates: o_orderstatus = 'F' | | table stats: 1500000 rows total | | column stats: all | | cardinality: 500000 | | per-host memory: 176.00MB | | tuple ids: 2 | | | | PLAN FRAGMENT 9 | | PARTITION: RANDOM | | | | STREAM DATA SINK | | EXCHANGE ID: 13 | | UNPARTITIONED | | | | 1:SCAN HDFS | | table=tpch.supplier #partitions=1/1 size=1.33MB | | table stats: 10000 rows total | | column stats: all | | cardinality: 10000 | | per-host memory: 32.00MB | | tuple ids: 1 | +-----------------------------------------------------------+