Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4373

Wrong results with correlated WHERE-clause subquery inside a NULL-checking conditional function.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
    • Fix Version/s: None
    • Component/s: Frontend
    • Labels:

      Description

      Impala may generate an incorrect plan for queries that have a correlated scalar subquery as a parameter to a NULL-checking conditional function like ISNULL().

      Example query and incorrect plan:

      select t1.int_col
      from functional.alltypessmall as t1
      where t1.int_col >= isnull
      (
         (
          SELECT 
           MAX(t2.bigint_col)
          FROM 
           functional.alltypestiny AS t2 
          WHERE 
           t1.id = t2.id + 10000
          ),
         0  
      )
      Fetched 0 row(s) in 1.09s
      
      Single-node plan:
      +-----------------------------------------------------------------------+
      | Explain String                                                        |
      +-----------------------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=0B VCores=0                   |
      |                                                                       |
      | PLAN-ROOT SINK                                                        |
      | |                                                                     |
      | 03:HASH JOIN [LEFT SEMI JOIN]                                         |
      | |  hash predicates: t1.id = t2.id + 10000                             |
      | |  other join predicates: t1.int_col >= isnull(max(t2.bigint_col), 0) |
      | |  runtime filters: RF000 <- t2.id + 10000                            |
      | |                                                                     |
      | |--02:AGGREGATE [FINALIZE]                                            |
      | |  |  output: max(t2.bigint_col)                                      |
      | |  |  group by: t2.id                                                 |
      | |  |                                                                  |
      | |  01:SCAN HDFS [functional.alltypestiny t2]                          |
      | |     partitions=4/4 files=4 size=460B                                |
      | |                                                                     |
      | 00:SCAN HDFS [functional.alltypessmall t1]                            |
      |    partitions=4/4 files=4 size=6.32KB                                 |
      |    runtime filters: RF000 -> t1.id                                    |
      +-----------------------------------------------------------------------+
      

      The query returns an empty result set but instead should return all rows from t1 because all invocations of the subquery return NULL, and all rows from t1 satisfy "t1.int_col >= 0".

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              alex.behm Alexander Behm
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: