Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2434

Incorrect Result: Union All clause causes fewer rows to be returned

    XMLWordPrintableJSON

Details

    Description

      Query 1:

      SELECT
      COALESCE(t208.pos, t208.pos, t208.pos) AS int_col,
      COALESCE(t208.pos, t207.field_0.field_9, t207.field_0.field_4) AS int_col_t209
      FROM table_0 t207
      INNER JOIN t207.field_0.field_5 t208
      UNION ALL
      SELECT
      (t211.field_40) * (t211.field_26.field_28.field_31) AS int_col,
      t211.field_26.field_28.field_33 AS float_col
      FROM table_0 t210
      INNER JOIN table_1 t211 ON ((t211.field_26.field_28.field_31) = (t210.field_13)) AND ((t211.field_44) = (t210.field_0.field_9))
      WHERE
      (t210.field_0.field_9) NOT IN (t210.field_13, 776.07)
      

      Query 1 Plan:

      +-------------------------------------------------------------------------------------------------------------------+
      | Explain String                                                                                                    |
      +-------------------------------------------------------------------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=2.23GB VCores=2                                                           |
      | WARNING: The following tables are missing relevant table and/or column statistics.                                |
      | random_nested_db_0.table_0, random_nested_db_0.table_1                                                            |
      |                                                                                                                   |
      | 10:EXCHANGE [UNPARTITIONED]                                                                                       |
      | |                                                                                                                 |
      | 00:UNION                                                                                                          |
      | |                                                                                                                 |
      | |--02:SUBPLAN                                                                                                     |
      | |  |                                                                                                              |
      | |  |--05:NESTED LOOP JOIN [CROSS JOIN]                                                                            |
      | |  |  |                                                                                                           |
      | |  |  |--03:SINGULAR ROW SRC                                                                                      |
      | |  |  |                                                                                                           |
      | |  |  04:UNNEST [t207.field_0.field_5 t208]                                                                       |
      | |  |                                                                                                              |
      | |  01:SCAN HDFS [random_nested_db_0.table_0 t207]                                                                 |
      | |     partitions=1/1 files=1 size=32.98MB                                                                         |
      | |                                                                                                                 |
      | 08:HASH JOIN [INNER JOIN, BROADCAST]                                                                              |
      | |  hash predicates: (t210.field_13) = (t211.field_26.field_28.field_31), (t210.field_0.field_9) = (t211.field_44) |
      | |                                                                                                                 |
      | |--09:EXCHANGE [BROADCAST]                                                                                        |
      | |  |                                                                                                              |
      | |  07:SCAN HDFS [random_nested_db_0.table_1 t211]                                                                 |
      | |     partitions=1/1 files=1 size=5.31MB                                                                          |
      | |     predicates: t211.field_44 NOT IN (t211.field_26.field_28.field_31, 776.07)                                  |
      | |                                                                                                                 |
      | 06:SCAN HDFS [random_nested_db_0.table_0 t210]                                                                    |
      |    partitions=1/1 files=1 size=32.98MB                                                                            |
      |    predicates: (t210.field_0.field_9) NOT IN (t210.field_13, 776.07)                                              |
      +-------------------------------------------------------------------------------------------------------------------+
      

      Query 2:

      SELECT
      COALESCE(t208.pos, t208.pos, t208.pos) AS int_col,
      COALESCE(t208.pos, t207.field_0.field_9, t207.field_0.field_4) AS int_col_t209
      FROM table_0 t207
      INNER JOIN t207.field_0.field_5 t208
      

      Query 2 Plan:

      +------------------------------------------------------------------------------------+
      | Explain String                                                                     |
      +------------------------------------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=144.00MB VCores=1                          |
      | WARNING: The following tables are missing relevant table and/or column statistics. |
      | random_nested_db_0.table_0                                                         |
      |                                                                                    |
      | 05:EXCHANGE [UNPARTITIONED]                                                        |
      | |                                                                                  |
      | 01:SUBPLAN                                                                         |
      | |                                                                                  |
      | |--04:NESTED LOOP JOIN [CROSS JOIN]                                                |
      | |  |                                                                               |
      | |  |--02:SINGULAR ROW SRC                                                          |
      | |  |                                                                               |
      | |  03:UNNEST [t207.field_0.field_5 t208]                                           |
      | |                                                                                  |
      | 00:SCAN HDFS [random_nested_db_0.table_0 t207]                                     |
      |    partitions=1/1 files=1 size=32.98MB                                             |
      +------------------------------------------------------------------------------------+
      

      Query 1 returned 1024 rows.
      Query 2 returned 27309 rows.

      To access the machine with the loaded data:

      ssh dev@vd0206.halxg.cloudera.com -p 33334 (pw: cloudera)
      run-all.sh && start-impala-cluster.py
      impala-shell.sh
      use random_nested_db_0;
      

      core dumps are saved to /tmp/core_files

      Attachments

        Activity

          People

            alex.behm Alexander Behm
            tarasbob Taras Bobrovytsky
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: