Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3125

Incorrect assignment of outer join On-clause that only references one side of the join.

    XMLWordPrintableJSON

Details

    Description

      Impala may return incorrect results for queries that have an outer join with an On-clause predicate that references at least two tables, but not the right-hand side of the join.

      Example query to repro and its plan:

      select a.id aid, b.id bid, a.int_col aint, b.int_col bint
      from functional.alltypes a
      inner join functional.alltypes b
        on a.int_col = b.int_col
      left outer join functional.alltypes c
        on a.id = b.id and b.bigint_col = c.bigint_col
      
      +-----------------------------------------------------------+
      | Explain String                                            |
      +-----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=320.08MB VCores=3 |
      |                                                           |
      | 08:EXCHANGE [UNPARTITIONED]                               |
      | |                                                         |
      | 04:HASH JOIN [LEFT OUTER JOIN, BROADCAST]                 |
      | |  hash predicates: b.bigint_col = c.bigint_col         <---  b.id = a.id should be here and not in join below|
      | |                                                         |
      | |--07:EXCHANGE [BROADCAST]                                |
      | |  |                                                      |
      | |  02:SCAN HDFS [functional.alltypes c]                   |
      | |     partitions=24/24 files=24 size=478.45KB             |
      | |                                                         |
      | 03:HASH JOIN [INNER JOIN, PARTITIONED]                    |
      | |  hash predicates: b.int_col = a.int_col, b.id = a.id    |
      | |  runtime filters: RF000 <- a.int_col, RF001 <- a.id     |
      | |                                                         |
      | |--06:EXCHANGE [HASH(a.int_col,a.id)]                     |
      | |  |                                                      |
      | |  00:SCAN HDFS [functional.alltypes a]                   |
      | |     partitions=24/24 files=24 size=478.45KB             |
      | |                                                         |
      | 05:EXCHANGE [HASH(b.int_col,b.id)]                        |
      | |                                                         |
      | 01:SCAN HDFS [functional.alltypes b]                      |
      |    partitions=24/24 files=24 size=478.45KB                |
      |    runtime filters: RF000 -> b.int_col, RF001 -> b.id     |
      +-----------------------------------------------------------+
      

      Attachments

        Activity

          People

            alex.behm Alexander Behm
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: