Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Not A Bug
-
Impala 2.10.0
-
None
-
ghx-label-9
Description
Impala may generate an incorrect plan for complex query.
(see NULL in id and a_id columns)
Can get correct result with commented lines (1, 2, 3, 4, 5)
Example query and incorrect plan:
with test as ( select id, b as b from( select 1 as id , 10 as b union all select 2 as id , 20 as b union all select 3 as id , 30 as b union all select 4 as id , 40 as b union all select 5 as id , 50 as b ) t group by id, b --1 limit 3 --2 ), test2 as ( select 1 as id, 10 as a_id union all select 2, 10 union all select 3, 20 union all select 4, 20 union all select 5, 30 union all select 6, 40 ) select * from test left --3 join (select id , a_id from (select id, a_id from test2 where id in (select id from test) --4 group by id, a_id) t group by id, a_id --5 ) e on test.id = e.id;
Result:
+----+----+------+------+ | id | b | id | a_id | +----+----+------+------+ | 2 | 20 | NULL | NULL | | 3 | 30 | 3 | 20 | | 5 | 50 | 5 | 30 | +----+----+------+------+
Plan:
+--------------------------------------------------+
| Explain String |
+--------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=9.69MB |
| Per-Host Resource Estimates: Memory=41.94MB |
| Codegen disabled by planner |
| |
| PLAN-ROOT SINK |
| | |
| 08:HASH JOIN [LEFT OUTER JOIN, BROADCAST] |
| | hash predicates: id = id |
| | |
| |--10:EXCHANGE [UNPARTITIONED] |
| | | |
| | 07:AGGREGATE [FINALIZE] |
| | | group by: id, a_id |
| | | |
| | 06:AGGREGATE [FINALIZE] |
| | | group by: id, a_id |
| | | |
| | 05:HASH JOIN [LEFT SEMI JOIN, BROADCAST] |
| | | hash predicates: id = id |
| | | |
| | |--09:EXCHANGE [UNPARTITIONED] |
| | | | |
| | | 04:AGGREGATE [FINALIZE] |
| | | | group by: id, b |
| | | | limit: 3 |
| | | | |
| | | 03:UNION |
| | | constant-operands=5 |
| | | |
| | 02:UNION |
| | constant-operands=6 |
| | |
| 01:AGGREGATE [FINALIZE] |
| | group by: id, b |
| | limit: 3 |
| | |
| 00:UNION |
| constant-operands=5 |
+--------------------------------------------------+
Correct result:
with test as ( select id, b as b from( select 1 as id , 10 as b union all select 2 as id , 20 as b union all select 3 as id , 30 as b union all select 4 as id , 40 as b union all select 5 as id , 50 as b ) t --group by id, b --1 limit 3 --2 ), test2 as ( select 1 as id, 10 as a_id union all select 2, 10 union all select 3, 20 union all select 4, 20 union all select 5, 30 union all select 6, 40 ) select * from test left join (select id , a_id from (select id, a_id from test2 where id in (select id from test) --3 group by id, a_id) t group by id, a_id --4 ) e on test.id = e.id;
Result:
+----+----+----+------+ | id | b | id | a_id | +----+----+----+------+ | 1 | 10 | 1 | 10 | | 2 | 20 | 2 | 10 | | 3 | 30 | 3 | 20 | +----+----+----+------+
+--------------------------------------------------+
| Explain String |
+--------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=5.81MB |
| Per-Host Resource Estimates: Memory=21.94MB |
| Codegen disabled by planner |
| |
| PLAN-ROOT SINK |
| | |
| 06:HASH JOIN [LEFT OUTER JOIN, BROADCAST] |
| | hash predicates: id = id |
| | |
| |--08:EXCHANGE [UNPARTITIONED] |
| | | |
| | 05:AGGREGATE [FINALIZE] |
| | | group by: id, a_id |
| | | |
| | 04:AGGREGATE [FINALIZE] |
| | | group by: id, a_id |
| | | |
| | 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST] |
| | | hash predicates: id = id |
| | | |
| | |--07:EXCHANGE [UNPARTITIONED] |
| | | | |
| | | 02:UNION |
| | | constant-operands=5 |
| | | limit: 3 |
| | | |
| | 01:UNION |
| | constant-operands=6 |
| | |
| 00:UNION |
| constant-operands=5 |
| limit: 3 |
+--------------------------------------------------+