# Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate

XMLWordPrintableJSON

#### Details

• Bug
• Status: Resolved
• Blocker
• Resolution: Fixed
• Impala 2.3.0

#### Description

Problem
IN subqueries may return wrong results if the left-hand side of the IN is a constant. For example,

```select * from alltypestiny t1
where 10 not in (select sum(int_col) from alltypestiny)
```

Impala generates a wrong plan (see below):

```+----------------------------------------------------------+
| Explain String                                           |
+----------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=42.00MB VCores=1 |
|                                                          |
| 07:EXCHANGE [UNPARTITIONED]                              |
| |                                                        |
| |                                                        |
| |  |                                                     |
| |  05:AGGREGATE [FINALIZE]                               |
| |  |  output: sum:merge(int_col)                         |
| |  |  *having: 10 = sum(int_col) *                        |
| |  |                                                     |
| |  04:EXCHANGE [UNPARTITIONED]                           |
| |  |                                                     |
| |  02:AGGREGATE                                          |
| |  |  output: sum(int_col)                               |
| |  |                                                     |
| |  01:SCAN HDFS [functional.alltypestiny t1]             |
| |     partitions=4/4 files=4 size=460B                   |
| |                                                        |
| 00:SCAN HDFS [functional.alltypestiny t1]                |
|    partitions=4/4 files=4 size=460B                      |
+----------------------------------------------------------+
```

Workaround
As a workaround, wrap the constant into an inline view and reference the view column in the IN subquery. To fix the example above:

```select * from alltypestiny t1, (select 10 tmp) v
where tmp not in (select sum(int_col) from alltypestiny);
```

Query Gen Details

Impala returns 0 rows; Postgres returns 8 rows.

Impala Query:

```SELECT
COALESCE(t1.year, t1.id) AS int_col,
COALESCE(t1.tinyint_col, t1.tinyint_col) AS int_col_1,
IF(t1.bool_col, t1.float_col, t1.float_col) AS float_col
FROM alltypestiny t1
WHERE
(365.06) NOT IN (SELECT
COALESCE(77, -188, COALESCE(-87, SUM(t1.id), -151)) AS int_col
FROM alltypestiny t1
WHERE
(t1.tinyint_col) IN (t1.month, -306.35)
HAVING
(MIN(t1.int_col)) >= (606))
```

Equivalent Postgres Query:

```SELECT
COALESCE(t1.year, t1.id) AS int_col,
COALESCE(t1.tinyint_col, t1.tinyint_col) AS int_col_1,
CASE WHEN t1.bool_col THEN t1.float_col ELSE t1.float_col END AS float_col
FROM alltypestiny t1
WHERE
(365.06) NOT IN (SELECT
COALESCE(77, -188, COALESCE(-87, SUM(t1.id), -151)) AS int_col
FROM alltypestiny t1
WHERE
(t1.tinyint_col) IN (t1.month, -306.35)
HAVING
(MIN(t1.int_col)) >= (606))
```

#### People

Dimitris Tsirogiannis
Taras Bobrovytsky