Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3126

Incorrect assignment of an inner join On-clause predicate through an outer join.

    Details

      Description

      Impala may return incorrect results for queries that have the following properties:

      • There is an INNER JOIN following a series of OUTER JOINs
      • The INNER JOIN has an On-clause with a predicate that references at least two tables that are on the nullable side of the preceding OUTER JOINs

      Query to repro and its plan:

      select 1 from functional.alltypes a
      left outer join functional.alltypes b
        on a.id = b.id
      left outer join functional.alltypes c
        on b.id = c.id
      right outer join functional.alltypes d
        on c.id = d.id
      inner join functional.alltypes e
        on b.int_col = c.int_col
      
      +-----------------------------------------------------------+
      | Explain String                                            |
      +-----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=480.04MB VCores=4 |
      |                                                           |
      | 14:EXCHANGE [UNPARTITIONED]                               |
      | |                                                         |
      | 08:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]               |
      | |                                                         |
      | |--13:EXCHANGE [BROADCAST]                                |
      | |  |                                                      |
      | |  04:SCAN HDFS [functional.alltypes e]                   |
      | |     partitions=24/24 files=24 size=478.45KB             |
      | |                                                         |
      | 07:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
      | |  hash predicates: c.id = d.id                           |
      | |  runtime filters: RF000 <- d.id                         |
      | |                                                         |
      | |--12:EXCHANGE [HASH(d.id)]                               |
      | |  |                                                      |
      | |  03:SCAN HDFS [functional.alltypes d]                   |
      | |     partitions=24/24 files=24 size=478.45KB             |
      | |                                                         |
      | 06:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]               |
      | |  hash predicates: b.id = c.id                           |
      | |  other predicates: b.int_col = c.int_col     <--- incorrect placement; should be at node 07 or 08           |
      | |  runtime filters: RF001 <- c.int_col                    |
      | |                                                         |
      | |--11:EXCHANGE [HASH(c.id)]                               |
      | |  |                                                      |
      | |  02:SCAN HDFS [functional.alltypes c]                   |
      | |     partitions=24/24 files=24 size=478.45KB             |
      | |     runtime filters: RF000 -> c.id                      |
      | |                                                         |
      | 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
      | |  hash predicates: b.id = a.id                           |
      | |  runtime filters: RF002 <- a.id                         |
      | |                                                         |
      | |--10:EXCHANGE [HASH(a.id)]                               |
      | |  |                                                      |
      | |  00:SCAN HDFS [functional.alltypes a]                   |
      | |     partitions=24/24 files=24 size=478.45KB             |
      | |                                                         |
      | 09:EXCHANGE [HASH(b.id)]                                  |
      | |                                                         |
      | 01:SCAN HDFS [functional.alltypes b]                      |
      |    partitions=24/24 files=24 size=478.45KB                |
      |    runtime filters: RF001 -> b.int_col, RF002 -> b.id     |
      +-----------------------------------------------------------+
      

      Workaround
      For some queries, this problem can be worked around by placing the problematic On-clause predicate in the WHERE clause instead, or changing the preceding OUTER JOINS to INNER JOINS (if the On-clause predicate would discard NULLs).

      To fix the example query above:

      select 1 from functional.alltypes a
      left outer join functional.alltypes b
        on a.id = b.id
      left outer join functional.alltypes c
        on b.id = c.id
      right outer join functional.alltypes d
        on c.id = d.id
      inner join functional.alltypes e
      where b.int_col = c.int_col
      
      +-----------------------------------------------------------+
      | Explain String                                            |
      +-----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=480.04MB VCores=4 |
      |                                                           |
      | 14:EXCHANGE [UNPARTITIONED]                               |
      | |                                                         |
      | 08:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]               |
      | |                                                         |
      | |--13:EXCHANGE [BROADCAST]                                |
      | |  |                                                      |
      | |  04:SCAN HDFS [functional.alltypes e]                   |
      | |     partitions=24/24 files=24 size=478.45KB             |
      | |                                                         |
      | 07:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
      | |  hash predicates: c.id = d.id                           |
      | |  other predicates: b.int_col = c.int_col          <-- correct assignment      |
      | |  runtime filters: RF000 <- d.id                         |
      | |                                                         |
      | |--12:EXCHANGE [HASH(d.id)]                               |
      | |  |                                                      |
      | |  03:SCAN HDFS [functional.alltypes d]                   |
      | |     partitions=24/24 files=24 size=478.45KB             |
      | |                                                         |
      | 06:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]               |
      | |  hash predicates: b.id = c.id                           |
      | |                                                         |
      | |--11:EXCHANGE [HASH(c.id)]                               |
      | |  |                                                      |
      | |  02:SCAN HDFS [functional.alltypes c]                   |
      | |     partitions=24/24 files=24 size=478.45KB             |
      | |     runtime filters: RF000 -> c.id                      |
      | |                                                         |
      | 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
      | |  hash predicates: b.id = a.id                           |
      | |  runtime filters: RF001 <- a.id                         |
      | |                                                         |
      | |--10:EXCHANGE [HASH(a.id)]                               |
      | |  |                                                      |
      | |  00:SCAN HDFS [functional.alltypes a]                   |
      | |     partitions=24/24 files=24 size=478.45KB             |
      | |                                                         |
      | 09:EXCHANGE [HASH(b.id)]                                  |
      | |                                                         |
      | 01:SCAN HDFS [functional.alltypes b]                      |
      |    partitions=24/24 files=24 size=478.45KB                |
      |    runtime filters: RF001 -> b.id                         |
      +-----------------------------------------------------------+
      

        Attachments

          Activity

            People

            • Assignee:
              alex.behm Alexander Behm
              Reporter:
              alex.behm Alexander Behm
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: