Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2665

Incorrect assignment of On-clause predicate inside inline view with an outer join.

    XMLWordPrintableJSON

Details

    Description

      A query may return incorrect results due to wrong predicate assignment in the following scenario:
      1. There is an inline view that contains an outer join
      2. That inline view is joined with another table in the enclosing query block
      3. That join has an On-clause containing a predicate that only references columns originating from the outer-joined tables inside the inline view

      Example/Repro:

      create table t (c1 int, c2 int);
      insert into t values(1, 1);
      
      The following query returns wrong results. We expect an empty result set.
      
      select * from t t1 join
        (select a.c1, b.c2
         from t a left outer join t b
         on a.c1 = b.c2) v
      on v.c2 is null
      +----+----+----+------+
      | c1 | c2 | c1 | c2   |
      +----+----+----+------+
      | 1  | 1  | 1  | NULL |
      +----+----+----+------+
      
      Plan:
      +------------------------------------------------------------------------------------+
      | Explain String                                                                     |
      +------------------------------------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=4.06GB VCores=2                            |
      | WARNING: The following tables are missing relevant table and/or column statistics. |
      | default.t                                                                          |
      |                                                                                    |
      | 07:EXCHANGE [UNPARTITIONED]                                                        |
      | |                                                                                  |
      | 04:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]                                        |
      | |                                                                                  |
      | |--06:EXCHANGE [BROADCAST]                                                         |
      | |  |                                                                               |
      | |  03:HASH JOIN [LEFT OUTER JOIN, BROADCAST]                                       |
      | |  |  hash predicates: a.c1 = b.c2                                                 |
      | |  |                                                                               |
      | |  |--05:EXCHANGE [BROADCAST]                                                      |
      | |  |  |                                                                            |
      | |  |  02:SCAN HDFS [default.t b]                                                   |
      | |  |     partitions=1/1 files=1 size=4B                                            |
      | |  |     predicates: b.c2 IS NULL      <--- This is the bug. Predicate should be assigned to the LEFT OUTER JOIN.                                            |
      | |  |                                                                               |
      | |  01:SCAN HDFS [default.t a]                                                      |
      | |     partitions=1/1 files=1 size=4B                                               |
      | |                                                                                  |
      | 00:SCAN HDFS [default.t t1]                                                        |
      |    partitions=1/1 files=1 size=4B                                                  |
      +------------------------------------------------------------------------------------+
      

      Workaround
      As a workaround the offending predicate from the On-clause can be moved into the WHERE clause. Note that this workaround is only correct will if the original On-clause belongs to an INNER or SEMI join.

      Query with workaround applied returns correct results.

      select * from t t1 join
        (select a.c1, b.c2
         from t a left outer join t b
         on a.c1 = b.c2) v
      where v.c2 is null
      
      Correctly returns an empty result set.
      
      Plan:
      +------------------------------------------------------------------------------------+
      | Explain String                                                                     |
      +------------------------------------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=4.06GB VCores=2                            |
      | WARNING: The following tables are missing relevant table and/or column statistics. |
      | default.t                                                                          |
      |                                                                                    |
      | 07:EXCHANGE [UNPARTITIONED]                                                        |
      | |                                                                                  |
      | 04:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]                                        |
      | |                                                                                  |
      | |--06:EXCHANGE [BROADCAST]                                                         |
      | |  |                                                                               |
      | |  03:HASH JOIN [LEFT OUTER JOIN, BROADCAST]                                       |
      | |  |  hash predicates: a.c1 = b.c2                                                 |
      | |  |  other predicates: b.c2 IS NULL    <--- Correct assignment.                                           |
      | |  |                                                                               |
      | |  |--05:EXCHANGE [BROADCAST]                                                      |
      | |  |  |                                                                            |
      | |  |  02:SCAN HDFS [default.t b]                                                   |
      | |  |     partitions=1/1 files=1 size=4B                                            |
      | |  |                                                                               |
      | |  01:SCAN HDFS [default.t a]                                                      |
      | |     partitions=1/1 files=1 size=4B                                               |
      | |                                                                                  |
      | 00:SCAN HDFS [default.t t1]                                                        |
      |    partitions=1/1 files=1 size=4B                                                  |
      +------------------------------------------------------------------------------------+
      

      Attachments

        Activity

          People

            alex.behm Alexander Behm
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: