Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8386

Incorrect predicate in a left outer join query

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.12.0, Impala 3.2.0
    • Fix Version/s: Impala 3.3.0
    • Component/s: Frontend
    • Labels:
    • Epic Color:
      ghx-label-8

      Description

      skyyws <skyyws@163.com> reported a bug in the mailing list on the following data set:

      table A
      +------+
      | a_id |
      +------+
      | 1    |
      | 2    |
      +------+
      table B
      +------+--------+
      | b_id | amount |
      +------+--------+
      | 1    | 10     |
      | 1    | 20     |
      | 2    | NULL   |
      +------+--------+
      table C
      +------+------+
      | a_id | b_id |
      +------+------+
      | 1    | 1    |
      | 2    | 2    |
      +------+------+

      The following query returns a wrong result "1":

      select count(1) from (
          select t2.a_id,t2.amount1,t2.amount2
          from( select a_id from a) t1
          left outer join (
              select c.a_id,sum(amount) as amount1,sum(amount) as amount2
              from b join c  on b.b_id = c.b_id group by c.a_id) t2
          on t1.a_id = t2.a_id
      ) t;
      

      Removing "t2.amount2" can get the right result "2":

      select count(1) from (
          select t2.a_id,t2.amount1
          from( select a_id from a) t1
          left outer join (
              select c.a_id,sum(amount) as amount1,sum(amount) as amount2
              from b join c  on b.b_id = c.b_id group by c.a_id) t2
          on t1.a_id = t2.a_id
      ) t;
      

      The problem is that in query 1, Impala generates a wrong predicate "sum(amount) = sum(amount)" which rejects nulls:

      +-------------------------------------------------------------+
      | Explain String                                              |
      +-------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=10.84MB Threads=8 |
      | Per-Host Resource Estimates: Memory=143MB                   |
      | Codegen disabled by planner                                 |
      |                                                             |
      | PLAN-ROOT SINK                                              |
      | |                                                           |
      | 12:AGGREGATE [FINALIZE]                                     |
      | |  output: count:merge(*)                                   |
      | |  row-size=8B cardinality=1                                |
      | |                                                           |
      | 11:EXCHANGE [UNPARTITIONED]                                 |
      | |                                                           |
      | 06:AGGREGATE                                                |
      | |  output: count(*)                                         |
      | |  row-size=8B cardinality=1                                |
      | |                                                           |
      | 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]                |
      | |  hash predicates: c.a_id = a_id                           |
      | |  other predicates: sum(amount) = sum(amount)      <---------- Wrong inferred predicate which incorrectly reject nulls
      | |  runtime filters: RF000 <- a_id                           |
      | |  row-size=16B cardinality=2                               |
      | |                                                           |
      | |--10:EXCHANGE [HASH(a_id)]                                 |
      | |  |                                                        |
      | |  00:SCAN HDFS [default.a]                                 |
      | |     partitions=1/1 files=1 size=4B                        |
      | |     row-size=4B cardinality=2                             |
      | |                                                           |
      | 09:AGGREGATE [FINALIZE]                                     |
      | |  output: sum:merge(amount)                                |
      | |  group by: c.a_id                                         |
      | |  row-size=12B cardinality=2                               |
      | |                                                           |
      | 08:EXCHANGE [HASH(c.a_id)]                                  |
      | |                                                           |
      | 04:AGGREGATE [STREAMING]                                    |
      | |  output: sum(amount)                                      |
      | |  group by: c.a_id                                         |
      | |  row-size=12B cardinality=2                               |
      | |                                                           |
      | 03:HASH JOIN [INNER JOIN, BROADCAST]                        |
      | |  hash predicates: b.b_id = c.b_id                         |
      | |  runtime filters: RF002 <- c.b_id                         |
      | |  row-size=16B cardinality=3                               |
      | |                                                           |
      | |--07:EXCHANGE [BROADCAST]                                  |
      | |  |                                                        |
      | |  02:SCAN HDFS [default.c]                                 |
      | |     partitions=1/1 files=1 size=8B                        |
      | |     runtime filters: RF000 -> default.c.a_id              |
      | |     row-size=8B cardinality=2                             |
      | |                                                           |
      | 01:SCAN HDFS [default.b]                                    |
      |    partitions=1/1 files=1 size=15B                          |
      |    runtime filters: RF002 -> b.b_id                         |
      |    row-size=8B cardinality=3                                |
      +-------------------------------------------------------------+
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                stiga-huang Quanlong Huang
                Reporter:
                stiga-huang Quanlong Huang
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: