Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4423

Wrong results with several conjunctive EXISTS subqueries that can be evaluated at query-compile time.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Frontend
    • Labels:

      Description

      Queries with several AND-ed EXISTS subqueries in the WHERE clause may produce incorrect results if some of the subqueries can be evaluated at query compile time.

      Repro with wrong plan:

      select 1
      from functional.alltypestiny t1
      where not exists
        (select id
         from functional.alltypes t2
         where t1.int_col = t2.int_col limit 0)
      and not exists <-- this subquery should be folded to "FALSE"
        (select min(int_col)
         from functional.alltypestiny t5
         where t1.id = t5.id and false)
      
      +-----------------------------------------------------+
      | Explain String                                      |
      +-----------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=0B VCores=0 |
      |                                                     |
      | PLAN-ROOT SINK                                      |
      | |                                                   |
      | 00:SCAN HDFS [functional.alltypestiny t1]           |
      |    partitions=4/4 files=4 size=460B                 |
      +-----------------------------------------------------+
      

      Same query as above but flipping the order of subqueries gives the correct plan:

      select 1
      from functional.alltypestiny t1
      where not exists
        (select min(int_col)
         from functional.alltypestiny t5
         where t1.id = t5.id and false)
      and not exists
        (select id
         from functional.alltypes t2
         where t1.int_col = t2.int_col limit 0)
      
      +---------------------------------------------------------+
      | Explain String                                          |
      +---------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=1.00KB VCores=1 |
      |                                                         |
      | PLAN-ROOT SINK                                          |
      | |                                                       |
      | 00:EMPTYSET                                             |
      +---------------------------------------------------------+
      

      The underlying problem is that we substitute out the subqueries with constant literals using an ExprSubstitutionMap, but the Subquery.equals() function is not implemented properly, so the second subquery is replaced with whatever boolean literal corresponds to the first subquery.

        Activity

        Hide
        alex.behm Alexander Behm added a comment -

        commit c5f49ec9bbc1b191545b6edd2848acb632b45973
        Author: Alex Behm <alex.behm@cloudera.com>
        Date: Wed Nov 2 10:54:32 2016 -0700

        IMPALA-4423: Correct but conservative implementation of Subquery.equals().

        The underlying problem was for trivial/constant [NOT] EXISTS subqueries
        we substituted out Subqueries with bool literals using an ExprSubstitutionMap,
        but the Subquery.equals() function was not implemented properly, so we ended
        up matching Subqueries to the wrong entry in the ExprSubstitutionMap.
        This could ultimately lead to wrong plans and results.

        Testing: Corrected an existing test and modified an existing test for
        extra coverage.

        Change-Id: I5562d98ce36507aa5e253323e184fd42b54f27ed
        Reviewed-on: http://gerrit.cloudera.org:8080/4923
        Reviewed-by: Alex Behm <alex.behm@cloudera.com>
        Tested-by: Internal Jenkins

        Show
        alex.behm Alexander Behm added a comment - commit c5f49ec9bbc1b191545b6edd2848acb632b45973 Author: Alex Behm <alex.behm@cloudera.com> Date: Wed Nov 2 10:54:32 2016 -0700 IMPALA-4423 : Correct but conservative implementation of Subquery.equals(). The underlying problem was for trivial/constant [NOT] EXISTS subqueries we substituted out Subqueries with bool literals using an ExprSubstitutionMap, but the Subquery.equals() function was not implemented properly, so we ended up matching Subqueries to the wrong entry in the ExprSubstitutionMap. This could ultimately lead to wrong plans and results. Testing: Corrected an existing test and modified an existing test for extra coverage. Change-Id: I5562d98ce36507aa5e253323e184fd42b54f27ed Reviewed-on: http://gerrit.cloudera.org:8080/4923 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins

          People

          • Assignee:
            alex.behm Alexander Behm
            Reporter:
            alex.behm Alexander Behm
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development