Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6329

Wrong results for complex query with CTE, limit, group by and left join

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Not A Bug
    • Affects Version/s: Impala 2.10.0
    • Fix Version/s: None
    • Component/s: Frontend
    • Labels:

      Description

      Impala may generate an incorrect plan for complex query.
      (see NULL in id and a_id columns)
      Can get correct result with commented lines (1, 2, 3, 4, 5)
      Example query and incorrect plan:

      with test as (
        select id, b as b from(
         select 1 as id , 10 as b union all
         select 2 as id , 20 as b union all
         select 3 as id , 30 as b union all
         select 4 as id , 40 as b union all
         select 5 as id , 50 as b
        ) t 
        group by id, b --1
        limit 3 --2
      ), 
      test2 as (
        select 1 as id, 10 as a_id union all
        select 2, 10 union all
        select 3, 20 union all
        select 4, 20 union all
        select 5, 30 union all
        select 6, 40
      )
      select * from test 
      left --3
      join
      (select id , a_id
         from (select id, a_id
                 from test2
                where id in (select id from test) --4
                group by id, a_id) t
        group by id, a_id --5
      ) e on test.id = e.id;
      

      Result:

      +----+----+------+------+
      | id | b  | id   | a_id |
      +----+----+------+------+
      | 2  | 20 | NULL | NULL |
      | 3  | 30 | 3    | 20   |
      | 5  | 50 | 5    | 30   |
      +----+----+------+------+
      

      Plan:

      +--------------------------------------------------+
      | Explain String                                   |
      +--------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=9.69MB |
      | Per-Host Resource Estimates: Memory=41.94MB      |
      | Codegen disabled by planner                      |
      |                                                  |
      | PLAN-ROOT SINK                                   |
      | |                                                |
      | 08:HASH JOIN [LEFT OUTER JOIN, BROADCAST]        |
      | |  hash predicates: id = id                      |
      | |                                                |
      | |--10:EXCHANGE [UNPARTITIONED]                   |
      | |  |                                             |
      | |  07:AGGREGATE [FINALIZE]                       |
      | |  |  group by: id, a_id                         |
      | |  |                                             |
      | |  06:AGGREGATE [FINALIZE]                       |
      | |  |  group by: id, a_id                         |
      | |  |                                             |
      | |  05:HASH JOIN [LEFT SEMI JOIN, BROADCAST]      |
      | |  |  hash predicates: id = id                   |
      | |  |                                             |
      | |  |--09:EXCHANGE [UNPARTITIONED]                |
      | |  |  |                                          |
      | |  |  04:AGGREGATE [FINALIZE]                    |
      | |  |  |  group by: id, b                         |
      | |  |  |  limit: 3                                |
      | |  |  |                                          |
      | |  |  03:UNION                                   |
      | |  |     constant-operands=5                     |
      | |  |                                             |
      | |  02:UNION                                      |
      | |     constant-operands=6                        |
      | |                                                |
      | 01:AGGREGATE [FINALIZE]                          |
      | |  group by: id, b                               |
      | |  limit: 3                                      |
      | |                                                |
      | 00:UNION                                         |
      |    constant-operands=5                           |
      +--------------------------------------------------+
      

      Correct result:

      with test as (
        select id, b as b from(
         select 1 as id , 10 as b union all
         select 2 as id , 20 as b union all
         select 3 as id , 30 as b union all
         select 4 as id , 40 as b union all
         select 5 as id , 50 as b
        ) t 
        --group by id, b --1
        limit 3 --2
      ), 
      test2 as (
        select 1 as id, 10 as a_id union all
        select 2, 10 union all
        select 3, 20 union all
        select 4, 20 union all
        select 5, 30 union all
        select 6, 40
      )
      select * from test left join
      (select id , a_id
         from (select id, a_id
                 from test2
                where id in (select id from test) --3
                group by id, a_id) t
        group by id, a_id --4
      ) e on test.id = e.id;
      

      Result:

      +----+----+----+------+
      | id | b  | id | a_id |
      +----+----+----+------+
      | 1  | 10 | 1  | 10   |
      | 2  | 20 | 2  | 10   |
      | 3  | 30 | 3  | 20   |
      +----+----+----+------+
      
      +--------------------------------------------------+
      | Explain String                                   |
      +--------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=5.81MB |
      | Per-Host Resource Estimates: Memory=21.94MB      |
      | Codegen disabled by planner                      |
      |                                                  |
      | PLAN-ROOT SINK                                   |
      | |                                                |
      | 06:HASH JOIN [LEFT OUTER JOIN, BROADCAST]        |
      | |  hash predicates: id = id                      |
      | |                                                |
      | |--08:EXCHANGE [UNPARTITIONED]                   |
      | |  |                                             |
      | |  05:AGGREGATE [FINALIZE]                       |
      | |  |  group by: id, a_id                         |
      | |  |                                             |
      | |  04:AGGREGATE [FINALIZE]                       |
      | |  |  group by: id, a_id                         |
      | |  |                                             |
      | |  03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]      |
      | |  |  hash predicates: id = id                   |
      | |  |                                             |
      | |  |--07:EXCHANGE [UNPARTITIONED]                |
      | |  |  |                                          |
      | |  |  02:UNION                                   |
      | |  |     constant-operands=5                     |
      | |  |     limit: 3                                |
      | |  |                                             |
      | |  01:UNION                                      |
      | |     constant-operands=6                        |
      | |                                                |
      | 00:UNION                                         |
      |    constant-operands=5                           |
      |    limit: 3                                      |
      +--------------------------------------------------+
      

        Attachments

          Activity

            People

            • Assignee:
              alex.behm Alexander Behm
              Reporter:
              minouminou Alex
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: