Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7957

UNION ALL query returns incorrect results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 2.12.0
    • Impala 3.3.0
    • Frontend

    Description

      Synopsis:
      =========
      UNION ALL query returns incorrect results

      Problem:
      ========
      Customer reported a UNION ALL query returning incorrect results. The UNION ALL query has 2 legs, but Impala is only returning information from one leg.

      Issue can be reproduced in the latest version of Impala. Below is the reproduction case:

      create table mytest_t (c1 timestamp, c2 timestamp, c3 int, c4 int);
      insert into mytest_t values (now(), ADDDATE (now(),1), 1,1);
      insert into mytest_t values (now(), ADDDATE (now(),1), 2,2);
      insert into mytest_t values (now(), ADDDATE (now(),1), 3,3);
      
      SELECT t.c1
      FROM
       (SELECT c1, c2
       FROM mytest_t) t
      LEFT JOIN
       (SELECT c1, c2
       FROM mytest_t
       WHERE c2 = c1) t2 ON (t.c2 = t2.c2)
      UNION ALL
      VALUES (NULL)
      

      The above query produces the following execution plan:

      +------------------------------------------------------------------------------------+
      | Explain String                                                                     |
      +------------------------------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=34.02MB Threads=5                        |
      | Per-Host Resource Estimates: Memory=2.06GB                                         |
      | WARNING: The following tables are missing relevant table and/or column statistics. |
      | default.mytest_t                                                                   |
      |                                                                                    |
      | PLAN-ROOT SINK                                                                     |
      | |                                                                                  |
      | 06:EXCHANGE [UNPARTITIONED]                                                        |
      | |                                                                                  |
      | 00:UNION                                                                           |
      | |  constant-operands=1                                                             |
      | |                                                                                  |
      | 04:SELECT                                                                          |
      | |  predicates: default.mytest_t.c1 = default.mytest_t.c2                           |
      | |                                                                                  |
      | 03:HASH JOIN [LEFT OUTER JOIN, BROADCAST]                                          |
      | |  hash predicates: c2 = c2                                                        |
      | |                                                                                  |
      | |--05:EXCHANGE [BROADCAST]                                                         |
      | |  |                                                                               |
      | |  02:SCAN HDFS [default.mytest_t]                                                 |
      | |     partitions=1/1 files=3 size=192B                                             |
      | |     predicates: c2 = c1                                                          |
      | |                                                                                  |
      | 01:SCAN HDFS [default.mytest_t]                                                    |
      |    partitions=1/1 files=3 size=192B                                                |
      +------------------------------------------------------------------------------------+
      

      The issue is in operator 4:

      | 04:SELECT |
      | | predicates: default.mytest_t.c1 = default.mytest_t.c2 |
      

      It's definitely a bug with predicate placement - that c1 = c2 predicate shouldn't be evaluated outside the right branch of the LEFT JOIN.

      Thanks,
      Luis Martinez.

      Attachments

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              lmartinez@cloudera.com Luis E Martinez-Poblete
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: