Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1359

Incorrect plan after pushing predicate into inline view with FULL OUTER JOIN

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Works for Me
    • Impala 2.0
    • Product Backlog
    • None
    • commit 0d3cd410c9bf50a5b4658e299b247ac764f42b55
      Author: Nong Li <nong@cloudera.com>
      Date: Sun Oct 5 12:10:18 2014 -0700

          Log leaks on release builds.

    Description

      The plan below is incorrect. The predicate "WHERE t3.boolean_col_1" in the outer query is incorrectly pushed into the table scan with the FULL OUTER JOIN unchanged.

      Query: 
      explain 
      select t3.boolean_col_1, COUNT(1)
      FROM
        (SELECT t2.bool_col AS boolean_col_1
         FROM alltypes t1 
         FULL OUTER JOIN alltypestiny t2 ON t2.string_col = t1.string_col) t3
      WHERE t3.boolean_col_1
      GROUP BY t3.boolean_col_1
      +-----------------------------------------------------------+
      | Explain String                                            |
      +-----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=170.00MB VCores=2 |
      |                                                           |
      | 08:EXCHANGE [UNPARTITIONED]                               |
      | |                                                         |
      | 07:AGGREGATE [FINALIZE]                                   |
      | |  output: count:merge(1)                                 |
      | |  group by: t3.boolean_col_1                             |
      | |                                                         |
      | 06:EXCHANGE [HASH(t3.boolean_col_1)]                      |
      | |                                                         |
      | 03:AGGREGATE                                              |
      | |  output: count(1)                                       |
      | |  group by: t2.bool_col                                  |
      | |                                                         |
      | 02:HASH JOIN [FULL OUTER JOIN, PARTITIONED]               |
      | |  hash predicates: t1.string_col = t2.string_col         |
      | |                                                         |
      | |--05:EXCHANGE [HASH(t2.string_col)]                      |
      | |  |                                                      |
      | |  01:SCAN HDFS [functional.alltypestiny t2]              |
      | |     partitions=4/4 size=460B                            |
      | |     predicates: t2.bool_col                             |
      | |                                                         |
      | 04:EXCHANGE [HASH(t1.string_col)]                         |
      | |                                                         |
      | 00:SCAN HDFS [functional.alltypes t1]                     |
      |    partitions=24/24 size=478.45KB                         |
      +-----------------------------------------------------------+
      Fetched 27 row(s) in 0.01s
      

      The first row below should not be returned.

      Query: select t3.boolean_col_1, COUNT(1)
      FROM
      (SELECT t2.bool_col AS boolean_col_1
      FROM alltypes t1 FULL
      OUTER JOIN alltypestiny t2 ON t2.string_col = t1.string_col) t3
      WHERE t3.boolean_col_1
      GROUP BY t3.boolean_col_1
      +---------------+----------+
      | boolean_col_1 | count(1) |
      +---------------+----------+
      | NULL          | 6570     |
      | true          | 2920     |
      +---------------+----------+
      Fetched 2 row(s) in 0.89s
      

      Attachments

        Activity

          People

            alex.behm Alexander Behm
            caseyc casey
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: