Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.3.0
Description
Problem
IN subqueries may return wrong results if the left-hand side of the IN is a constant. For example,
select * from alltypestiny t1 where 10 not in (select sum(int_col) from alltypestiny)
Impala generates a wrong plan (see below):
+----------------------------------------------------------+ | Explain String | +----------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=42.00MB VCores=1 | | | | 07:EXCHANGE [UNPARTITIONED] | | | | | 03:CROSS JOIN [BROADCAST] | | | | | |--06:EXCHANGE [BROADCAST] | | | | | | | 05:AGGREGATE [FINALIZE] | | | | output: sum:merge(int_col) | | | | *having: 10 = sum(int_col) * | | | | | | | 04:EXCHANGE [UNPARTITIONED] | | | | | | | 02:AGGREGATE | | | | output: sum(int_col) | | | | | | | 01:SCAN HDFS [functional.alltypestiny t1] | | | partitions=4/4 files=4 size=460B | | | | | 00:SCAN HDFS [functional.alltypestiny t1] | | partitions=4/4 files=4 size=460B | +----------------------------------------------------------+
Workaround
As a workaround, wrap the constant into an inline view and reference the view column in the IN subquery. To fix the example above:
select * from alltypestiny t1, (select 10 tmp) v where tmp not in (select sum(int_col) from alltypestiny);
Query Gen Details
Impala returns 0 rows; Postgres returns 8 rows.
Impala Query:
SELECT COALESCE(t1.year, t1.id) AS int_col, COALESCE(t1.tinyint_col, t1.tinyint_col) AS int_col_1, IF(t1.bool_col, t1.float_col, t1.float_col) AS float_col FROM alltypestiny t1 WHERE (365.06) NOT IN (SELECT COALESCE(77, -188, COALESCE(-87, SUM(t1.id), -151)) AS int_col FROM alltypestiny t1 WHERE (t1.tinyint_col) IN (t1.month, -306.35) HAVING (MIN(t1.int_col)) >= (606))
Equivalent Postgres Query:
SELECT COALESCE(t1.year, t1.id) AS int_col, COALESCE(t1.tinyint_col, t1.tinyint_col) AS int_col_1, CASE WHEN t1.bool_col THEN t1.float_col ELSE t1.float_col END AS float_col FROM alltypestiny t1 WHERE (365.06) NOT IN (SELECT COALESCE(77, -188, COALESCE(-87, SUM(t1.id), -151)) AS int_col FROM alltypestiny t1 WHERE (t1.tinyint_col) IN (t1.month, -306.35) HAVING (MIN(t1.int_col)) >= (606))