Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.5.0, Impala 2.6.0, Impala 2.7.0
Description
Queries with several AND-ed EXISTS subqueries in the WHERE clause may produce incorrect results if some of the subqueries can be evaluated at query compile time.
Repro with wrong plan:
select 1 from functional.alltypestiny t1 where not exists (select id from functional.alltypes t2 where t1.int_col = t2.int_col limit 0) and not exists <-- this subquery should be folded to "FALSE" (select min(int_col) from functional.alltypestiny t5 where t1.id = t5.id and false) +-----------------------------------------------------+ | Explain String | +-----------------------------------------------------+ | Estimated Per-Host Requirements: Memory=0B VCores=0 | | | | PLAN-ROOT SINK | | | | | 00:SCAN HDFS [functional.alltypestiny t1] | | partitions=4/4 files=4 size=460B | +-----------------------------------------------------+
Same query as above but flipping the order of subqueries gives the correct plan:
select 1 from functional.alltypestiny t1 where not exists (select min(int_col) from functional.alltypestiny t5 where t1.id = t5.id and false) and not exists (select id from functional.alltypes t2 where t1.int_col = t2.int_col limit 0) +---------------------------------------------------------+ | Explain String | +---------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=1.00KB VCores=1 | | | | PLAN-ROOT SINK | | | | | 00:EMPTYSET | +---------------------------------------------------------+
The underlying problem is that we substitute out the subqueries with constant literals using an ExprSubstitutionMap, but the Subquery.equals() function is not implemented properly, so the second subquery is replaced with whatever boolean literal corresponds to the first subquery.