Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1459

Incorrect assignment of predicates through an outer join in an inline view.

    Details

      Description

      This query should not assign "a.id < b.id" through the outer join:

      Query: explain select 1 from (select a.id aid, b.id bid from
      functional.alltypes a inner join functional.alltypes b
      on a.id = b.id
      full outer join functional.alltypessmall c on a.id = c.id) v
      inner join functional.alltypestiny c on (aid < bid and aid = c.id)
      +-----------------------------------------------------------+
      | Explain String                                            |
      +-----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=224.01MB VCores=4 |
      |                                                           |
      | 11:EXCHANGE [UNPARTITIONED]                               |
      | |                                                         |
      | 06:HASH JOIN [INNER JOIN, PARTITIONED]                    |
      | |  hash predicates: a.id = c.id                           |
      | |                                                         |
      | |--10:EXCHANGE [HASH(c.id)]                               |
      | |  |                                                      |
      | |  05:SCAN HDFS [functional.alltypestiny c]               |
      | |     partitions=4/4 files=4 size=460B                    |
      | |                                                         |
      | 04:HASH JOIN [FULL OUTER JOIN, PARTITIONED]               |
      | |  hash predicates: a.id = c.id                           |
      | |                                                         |
      | |--09:EXCHANGE [HASH(c.id)]                               |
      | |  |                                                      |
      | |  02:SCAN HDFS [functional.alltypessmall c]              |
      | |     partitions=4/4 files=4 size=6.32KB                  |
      | |                                                         |
      | 03:HASH JOIN [INNER JOIN, PARTITIONED]                    |
      | |  hash predicates: a.id = b.id                           |
      | |  other predicates: a.id < b.id                          | <--- Wrong!
      | |                                                         |
      | |--08:EXCHANGE [HASH(b.id)]                               |
      | |  |                                                      |
      | |  01:SCAN HDFS [functional.alltypes b]                   |
      | |     partitions=24/24 files=24 size=478.45KB             |
      | |                                                         |
      | 07:EXCHANGE [HASH(a.id)]                                  |
      | |                                                         |
      | 00:SCAN HDFS [functional.alltypes a]                      |
      |    partitions=24/24 files=24 size=478.45KB                |
      +-----------------------------------------------------------+
      

      To understand the root cause better, consider this query with an inadvertently correct plan (the plan being correct relies on a bug):

      Query: explain select 1 from (select a.id aid, b.id bid from
      functional.alltypes a inner join functional.alltypes b
      on a.id = b.id
      full outer join functional.alltypessmall c on a.id = c.id) v
      where aid < bid
      +-----------------------------------------------------------+
      | Explain String                                            |
      +-----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=192.01MB VCores=3 |
      |                                                           |
      | 08:EXCHANGE [UNPARTITIONED]                               |
      | |                                                         |
      | 04:HASH JOIN [FULL OUTER JOIN, PARTITIONED]               |
      | |  hash predicates: a.id = c.id                           |
      | |  other predicates: a.id < b.id                          | <--- Correct
      | |                                                         |
      | |--07:EXCHANGE [HASH(c.id)]                               |
      | |  |                                                      |
      | |  02:SCAN HDFS [functional.alltypessmall c]              |
      | |     partitions=4/4 files=4 size=6.32KB                  |
      | |                                                         |
      | 03:HASH JOIN [INNER JOIN, PARTITIONED]                    |
      | |  hash predicates: a.id = b.id                           |
      | |                                                         |
      | |--06:EXCHANGE [HASH(b.id)]                               |
      | |  |                                                      |
      | |  01:SCAN HDFS [functional.alltypes b]                   |
      | |     partitions=24/24 files=24 size=478.45KB             |
      | |                                                         |
      | 05:EXCHANGE [HASH(a.id)]                                  |
      | |                                                         |
      | 00:SCAN HDFS [functional.alltypes a]                      |
      |    partitions=24/24 files=24 size=478.45KB                |
      +-----------------------------------------------------------+
      

      The latter plan works because we don't reset isWhereClauseConjunct when migrating the predicate into the inline view. However, leaving isWhereClauseConjunct set is actually incorrect (it should be migrated as a Having-clause conjunct)

        Attachments

          Activity

            People

            • Assignee:
              alex.behm Alexander Behm
              Reporter:
              alex.behm Alexander Behm
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: