Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2089

Predicate from the ON clause of an inner join is dropped during planning

    XMLWordPrintableJSON

Details

    Description

      Impala returns 546 rows and Postgres returns 0 rows for the following query.

      Query:

      SELECT
      t2.timestamp_col,
      t1.int_col_1
      FROM (
      SELECT
      COALESCE(t1.smallint_col, t1.month, t1.month) AS int_col,
      (COUNT(t1.int_col)) <= (COALESCE(t1.smallint_col, t1.month, t1.month)) AS boolean_col,
      (t1.bigint_col) + (t1.smallint_col) AS int_col_1
      FROM alltypes t1
      GROUP BY
      COALESCE(t1.smallint_col, t1.month, t1.month),
      (t1.bigint_col) + (t1.smallint_col)
      HAVING
      ((t1.bigint_col) + (t1.smallint_col)) != (COUNT((t1.bigint_col) + (t1.smallint_col)))
      ) t1
      INNER JOIN alltypes t2 ON (((t2.month) = (t1.int_col)) AND ((t2.month) = (t1.int_col_1))) AND ((t2.tinyint_col) = (t1.int_col))
      WHERE
      (t2.int_col) IN (t1.int_col_1, t1.int_col)
      

      As we can see from the plan below, the predicate t2.month = t1.int_col_1 is dropped from the ON clause of the inner join:

      +---------------------------------------------------------------------------------------------------------------------------+
      | Explain String                                                                                                            |
      +---------------------------------------------------------------------------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=170.00MB VCores=2                                                                 |
      |                                                                                                                           |
      | 07:EXCHANGE [UNPARTITIONED]                                                                                               |
      | |                                                                                                                         |
      | 03:HASH JOIN [INNER JOIN, BROADCAST]                                                                                      |
      | |  hash predicates: (t2.month) = coalesce(t1.smallint_col, t1.month, t1.month)                                            |
      | |  other predicates: (t2.int_col) IN ((t1.bigint_col) + (t1.smallint_col), coalesce(t1.smallint_col, t1.month, t1.month)) |
      | |                                                                                                                         |
      | |--06:EXCHANGE [BROADCAST]                                                                                                |
      | |  |                                                                                                                      |
      | |  05:AGGREGATE [FINALIZE]                                                                                                |
      | |  |  output: (count:merge(t1.int_col)), (count:merge((t1.bigint_col) + (t1.smallint_col)))                               |
      | |  |  group by: coalesce(t1.smallint_col, t1.month, t1.month), (t1.bigint_col) + (t1.smallint_col)                        |
      | |  |  having: (t1.bigint_col) + (t1.smallint_col) != (count((t1.bigint_col) + (t1.smallint_col)))                         |
      | |  |                                                                                                                      |
      | |  04:EXCHANGE [HASH(coalesce(t1.smallint_col, t1.month, t1.month),(t1.bigint_col) + (t1.smallint_col))]                  |
      | |  |                                                                                                                      |
      | |  01:AGGREGATE                                                                                                           |
      | |  |  output: (count(t1.int_col)), (count((t1.bigint_col) + (t1.smallint_col)))                                           |
      | |  |  group by: coalesce(t1.smallint_col, t1.month, t1.month), (t1.bigint_col) + (t1.smallint_col)                        |
      | |  |                                                                                                                      |
      | |  00:SCAN HDFS [functional.alltypes t1]                                                                                  |
      | |     partitions=24/24 files=24 size=478.45KB                                                                             |
      | |                                                                                                                         |
      | 02:SCAN HDFS [functional.alltypes t2]                                                                                     |
      |    partitions=24/24 files=24 size=478.45KB                                                                                |
      |    predicates: t2.month = t2.tinyint_col                                                                                  |
      +---------------------------------------------------------------------------------------------------------------------------+
      

      Attachments

        Activity

          People

            alex.behm Alexander Behm
            tarasbob Taras Bobrovytsky
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: