Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.0
-
None
Description
This query should not assign "a.id < b.id" through the outer join:
Query: explain select 1 from (select a.id aid, b.id bid from functional.alltypes a inner join functional.alltypes b on a.id = b.id full outer join functional.alltypessmall c on a.id = c.id) v inner join functional.alltypestiny c on (aid < bid and aid = c.id) +-----------------------------------------------------------+ | Explain String | +-----------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=224.01MB VCores=4 | | | | 11:EXCHANGE [UNPARTITIONED] | | | | | 06:HASH JOIN [INNER JOIN, PARTITIONED] | | | hash predicates: a.id = c.id | | | | | |--10:EXCHANGE [HASH(c.id)] | | | | | | | 05:SCAN HDFS [functional.alltypestiny c] | | | partitions=4/4 files=4 size=460B | | | | | 04:HASH JOIN [FULL OUTER JOIN, PARTITIONED] | | | hash predicates: a.id = c.id | | | | | |--09:EXCHANGE [HASH(c.id)] | | | | | | | 02:SCAN HDFS [functional.alltypessmall c] | | | partitions=4/4 files=4 size=6.32KB | | | | | 03:HASH JOIN [INNER JOIN, PARTITIONED] | | | hash predicates: a.id = b.id | | | other predicates: a.id < b.id | <--- Wrong! | | | | |--08:EXCHANGE [HASH(b.id)] | | | | | | | 01:SCAN HDFS [functional.alltypes b] | | | partitions=24/24 files=24 size=478.45KB | | | | | 07:EXCHANGE [HASH(a.id)] | | | | | 00:SCAN HDFS [functional.alltypes a] | | partitions=24/24 files=24 size=478.45KB | +-----------------------------------------------------------+
To understand the root cause better, consider this query with an inadvertently correct plan (the plan being correct relies on a bug):
Query: explain select 1 from (select a.id aid, b.id bid from functional.alltypes a inner join functional.alltypes b on a.id = b.id full outer join functional.alltypessmall c on a.id = c.id) v where aid < bid +-----------------------------------------------------------+ | Explain String | +-----------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=192.01MB VCores=3 | | | | 08:EXCHANGE [UNPARTITIONED] | | | | | 04:HASH JOIN [FULL OUTER JOIN, PARTITIONED] | | | hash predicates: a.id = c.id | | | other predicates: a.id < b.id | <--- Correct | | | | |--07:EXCHANGE [HASH(c.id)] | | | | | | | 02:SCAN HDFS [functional.alltypessmall c] | | | partitions=4/4 files=4 size=6.32KB | | | | | 03:HASH JOIN [INNER JOIN, PARTITIONED] | | | hash predicates: a.id = b.id | | | | | |--06:EXCHANGE [HASH(b.id)] | | | | | | | 01:SCAN HDFS [functional.alltypes b] | | | partitions=24/24 files=24 size=478.45KB | | | | | 05:EXCHANGE [HASH(a.id)] | | | | | 00:SCAN HDFS [functional.alltypes a] | | partitions=24/24 files=24 size=478.45KB | +-----------------------------------------------------------+
The latter plan works because we don't reset isWhereClauseConjunct when migrating the predicate into the inline view. However, leaving isWhereClauseConjunct set is actually incorrect (it should be migrated as a Having-clause conjunct)