Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6286

Wrong results with outer join and RUNTIME_FILTER_MODE=GLOBAL

    Details

    • Epic Color:
      ghx-label-6

      Description

      Queries with the following characteristics may produce wrong results due to an incorrectly assigned runtime filter:

      • The query option RUNTIME_FILTER_MODE is set to GLOBAL
      • The query has an outer join
      • A scan on the nullable side of that outer join has a runtime filter with a NULL-checking expression such as COALESCE/IFNULL/CASE
      • The latter point imples that there is another join above the outer join with a NULL-checking expression in it's join condition

      Reproduction:

      select count(*) from functional.alltypestiny t1
      left outer join functional.alltypestiny t2
        on t1.id = t2.id
      where coalesce(t2.id + 10, 100) in (select 100)
      +----------+
      | count(*) |
      +----------+
      | 8        |
      +----------+
      

      We expect a count of 0. A count of 8 is incorrect.

      Query plan:

      +---------------------------------------------------------------+
      | Explain String                                                |
      +---------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=3.88MB              |
      | Per-Host Resource Estimates: Memory=87.88MB                   |
      | Codegen disabled by planner                                   |
      |                                                               |
      | PLAN-ROOT SINK                                                |
      | |                                                             |
      | 10:AGGREGATE [FINALIZE]                                       |
      | |  output: count:merge(*)                                     |
      | |                                                             |
      | 09:EXCHANGE [UNPARTITIONED]                                   |
      | |                                                             |
      | 05:AGGREGATE                                                  |
      | |  output: count(*)                                           |
      | |                                                             |
      | 04:HASH JOIN [LEFT SEMI JOIN, BROADCAST]                      |
      | |  hash predicates: coalesce(t2.id + 10, 100) = `$a$1`.`$c$1` |
      | |  runtime filters: RF000 <- `$a$1`.`$c$1`                    |
      | |                                                             |
      | |--08:EXCHANGE [BROADCAST]                                    |
      | |  |                                                          |
      | |  02:UNION                                                   |
      | |     constant-operands=1                                     |
      | |                                                             |
      | 03:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]                   |
      | |  hash predicates: t1.id = t2.id                             |
      | |                                                             |
      | |--07:EXCHANGE [HASH(t2.id)]                                  |
      | |  |                                                          |
      | |  01:SCAN HDFS [functional.alltypestiny t2]                  |
      | |     partitions=4/4 files=4 size=460B                        |
      | |     runtime filters: RF000 -> coalesce(t2.id + 10, 100)  <--- This runtime filter is not correct   |
      | |                                                             |
      | 06:EXCHANGE [HASH(t1.id)]                                     |
      | |                                                             |
      | 00:SCAN HDFS [functional.alltypestiny t1]                     |
      |    partitions=4/4 files=4 size=460B                           |
      +---------------------------------------------------------------+
      

      Explanation:

      • RF000 filters out all rows in scan 01
      • In join 03 there are no join matches since the right-hand is empty. All rows from the right-hand side are nulled.
      • The join condition in join 04 now satisfies all input rows because every "t2.id" is NULL, so after the COALESCE() the join condition becomes 100 = 100

      Workaround

      • Set RUNTIME_FILTER_MODE to LOCAL or OFF

        Attachments

          Activity

            People

            • Assignee:
              alex.behm Alexander Behm
              Reporter:
              alex.behm Alexander Behm
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: