Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10182

Rows with NULLs filtered out with duplicate columns in subquery select inside UNION ALL

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Impala 4.0
    • Component/s: Frontend
    • Labels:
    • Target Version:
    • Epic Color:
      ghx-label-2

      Description

      Bug report from here - https://community.cloudera.com/t5/Support-Questions/quot-union-all-quot-dropping-records-with-all-null-empty/m-p/303153#M221415

      Repro:

      create database if not exists as_adventure;
      use as_adventure;
      CREATE tABLE IF NOT EXISTS
          as_adventure.t1 
          ( 
              productsubcategorykey INT, 
              productline STRING);
      
      insert into t1 values (1,'l1');
      insert into t1 values (2,'l1');
      insert into t1 values (1,'l2');
      insert into t1 values (3,'l3');
      insert into t1 values (null,'');
      
      select * from t1; 
      
      SELECT
          MIN(t_53.c_41)       c_41,
          CAST(NULL AS DOUBLE) c_43,
          CAST(NULL AS BIGINT) c_44,
          t_53.c2              c2,
          t_53.c3s0            c3s0,
          t_53.c4              c4,
          t_53.c5s0            c5s0
      FROM
          (   SELECT
                  t.productsubcategorykey c_41,
                  t.productline           c2,
                  t.productline           c3s0,
                  t.productsubcategorykey c4,
                  t.productsubcategorykey c5s0
              FROM
                  as_adventure.t1 t
              WHERE
                  true
              GROUP BY
                  2,
                  3,
                  4,
                  5 ) t_53
      GROUP BY
          4,
          5,
          6,
          7
       
      UNION ALL
      
      SELECT
          MIN(t_53.c_41)       c_41,
          CAST(NULL AS DOUBLE) c_43,
          CAST(NULL AS BIGINT) c_44,
          t_53.c2              c2,
          t_53.c3s0            c3s0,
          t_53.c4              c4,
          t_53.c5s0            c5s0
      FROM
          (   SELECT
                  t.productsubcategorykey c_41,
                  t.productline           c2,
                  t.productline           c3s0,
                  t.productsubcategorykey c4,
                  t.productsubcategorykey c5s0
              FROM
                  as_adventure.t1 t
              WHERE
                  true
              GROUP BY
                  2,
                  3,
                  4,
                  5 ) t_53
      GROUP BY
          4,
          5,
          6,
          7
      

      Somewhat similar to IMPALA-7957 in that the inferred predicates from the column equivalences get placed in a Select node. It's a bit different in that the NULLs that are filtered out from the predicates come from the base table.

      +------------------------------------------------------------------------------------------------+
      | Explain String                                                                                 |
      +------------------------------------------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=136.02MB Threads=6                                   |
      | Per-Host Resource Estimates: Memory=576MB                                                      |
      | WARNING: The following tables are missing relevant table and/or column statistics.             |
      | as_adventure.t1                                                                                |
      |                                                                                                |
      | PLAN-ROOT SINK                                                                                 |
      | |                                                                                              |
      | 13:EXCHANGE [UNPARTITIONED]                                                                    |
      | |                                                                                              |
      | 00:UNION                                                                                       |
      | |  row-size=52B cardinality=2                                                                  |
      | |                                                                                              |
      | |--08:SELECT                                                                                   |
      | |  |  predicates: t_53.c4 = t_53.c5s0, t_53.c2 = t_53.c3s0                                     |
      | |  |  row-size=36B cardinality=1                                                               |
      | |  |                                                                                           |
      | |  07:AGGREGATE [FINALIZE]                                                                     |
      | |  |  output: min(t.productsubcategorykey)                                                     |
      | |  |  group by: t.productline, t.productline, t.productsubcategorykey, t.productsubcategorykey |
      | |  |  row-size=36B cardinality=2                                                               |
      | |  |                                                                                           |
      | |  12:AGGREGATE [FINALIZE]                                                                     |
      | |  |  group by: t.productline, t.productsubcategorykey                                         |
      | |  |  row-size=16B cardinality=2                                                               |
      | |  |                                                                                           |
      | |  11:EXCHANGE [HASH(t.productline,t.productsubcategorykey)]                                   |
      | |  |                                                                                           |
      | |  06:AGGREGATE [STREAMING]                                                                    |
      | |  |  group by: t.productline, t.productsubcategorykey                                         |
      | |  |  row-size=16B cardinality=2                                                               |
      | |  |                                                                                           |
      | |  05:SCAN HDFS [as_adventure.t1 t]                                                            |
      | |     HDFS partitions=1/1 files=5 size=24B                                                     |
      | |     row-size=16B cardinality=2                                                               |
      | |                                                                                              |
      | 04:SELECT                                                                                      |
      | |  predicates: t_53.c4 = t_53.c5s0, t_53.c2 = t_53.c3s0                                        |
      | |  row-size=36B cardinality=1                                                                  |
      | |                                                                                              |
      | 03:AGGREGATE [FINALIZE]                                                                        |
      | |  output: min(t.productsubcategorykey)                                                        |
      | |  group by: t.productline, t.productline, t.productsubcategorykey, t.productsubcategorykey    |
      | |  row-size=36B cardinality=2                                                                  |
      | |                                                                                              |
      | 10:AGGREGATE [FINALIZE]                                                                        |
      | |  group by: t.productline, t.productsubcategorykey                                            |
      | |  row-size=16B cardinality=2                                                                  |
      | |                                                                                              |
      | 09:EXCHANGE [HASH(t.productline,t.productsubcategorykey)]                                      |
      | |                                                                                              |
      | 02:AGGREGATE [STREAMING]                                                                       |
      | |  group by: t.productline, t.productsubcategorykey                                            |
      | |  row-size=16B cardinality=2                                                                  |
      | |                                                                                              |
      | 01:SCAN HDFS [as_adventure.t1 t]                                                               |
      |    HDFS partitions=1/1 files=5 size=24B                                                        |
      |    row-size=16B cardinality=2                                                                  |
      +------------------------------------------------------------------------------------------------+
      
      

      This query is a simplified version of the original one:

      SELECT
          t_53.c4              c4,
          t_53.c5s0            c5s0
      FROM
          (   SELECT
                  t.productsubcategorykey c4,
                  t.productsubcategorykey c5s0
              FROM
                  as_adventure.t1 t
              GROUP BY
                  1,
                  2 ) t_53
      GROUP BY 1,2
      
      UNION ALL
      
      SELECT
          t_53.c4              c4,
          t_53.c5s0            c5s0
      FROM
          (   SELECT
                  t.productsubcategorykey c4,
                  t.productsubcategorykey c5s0
              FROM
                  as_adventure.t1 t
              GROUP BY
                  1, 2 ) t_53
      GROUP BY 1, 2;
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                amansinha Aman Sinha
                Reporter:
                tarmstrong Tim Armstrong
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: