Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-845

Using distinct inside aggregate function may cause incorrect result when using having clause

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 1.2.4, Impala 1.3
    • Impala 1.3
    • None
    • None

    Description

      This happens in 1.2.4, 1.3.0, and master.

      It seems the combination of DISTINCT/HAVING in the first query is somehow causing a problem.

      [localhost:21000] > select COUNT(DISTINCT 1) FROM tinytable HAVING COUNT(1) IS NULL;
      Query: select COUNT(DISTINCT 1) FROM tinytable HAVING COUNT(1) IS NULL
      +-------------------+
      | count(distinct 1) |
      +-------------------+
      | 0                 |
      +-------------------+
      Returned 1 row(s) in 0.12s
      
      
      [localhost:21000] > explain select COUNT(DISTINCT 1) FROM tinytable HAVING COUNT(1) IS NULL;
      Query: explain select COUNT(DISTINCT 1) FROM tinytable HAVING COUNT(1) IS NULL
      +----------------------------------------------------------+
      | Explain String                                           |
      +----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=42.00MB VCores=2 |
      |                                                          |
      | 06:AGGREGATE [MERGE FINALIZE]                            |
      | |  output: sum(count(1)), sum(sum(count(1)))             |
      | |                                                        |
      | 05:EXCHANGE [PARTITION=UNPARTITIONED]                    |
      | |                                                        |
      | 02:AGGREGATE [MERGE]                                     |
      | |  output: count(1), sum(count(1))                       |
      | |  having: sum(count(1)) IS NULL                         |
      | |                                                        |
      | 04:AGGREGATE [MERGE]                                     |
      | |  output: sum(count(1))                                 |
      | |  group by: 1                                           |
      | |                                                        |
      | 03:EXCHANGE [PARTITION=HASH(1)]                          |
      | |                                                        |
      | 01:AGGREGATE                                             |
      | |  output: count(1)                                      |
      | |  group by: 1                                           |
      | |                                                        |
      | 00:SCAN HDFS [functional.tinytable]                      |
      |    partitions=1/1 size=38B                               |
      +----------------------------------------------------------+
      Returned 23 row(s) in 0.01s
      
      
      [localhost:21000] > select COUNT(1) FROM tinytable HAVING COUNT(1) IS NULL;
      Query: select COUNT(1) FROM tinytable HAVING COUNT(1) IS NULL
      
      Returned 0 row(s) in 0.12s
      
      
      [localhost:21000] > explain select COUNT(1) FROM tinytable HAVING COUNT(1) IS NULL;
      Query: explain select COUNT(1) FROM tinytable HAVING COUNT(1) IS NULL
      +----------------------------------------------------------+
      | Explain String                                           |
      +----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=42.00MB VCores=1 |
      |                                                          |
      | 03:AGGREGATE [MERGE FINALIZE]                            |
      | |  output: sum(count(1))                                 |
      | |  having: count(1) IS NULL                              |
      | |                                                        |
      | 02:EXCHANGE [PARTITION=UNPARTITIONED]                    |
      | |                                                        |
      | 01:AGGREGATE                                             |
      | |  output: count(1)                                      |
      | |                                                        |
      | 00:SCAN HDFS [functional.tinytable]                      |
      |    partitions=1/1 size=38B                               |
      +----------------------------------------------------------+
      Returned 13 row(s) in 0.02s
      
      
      [localhost:21000] > select COUNT(DISTINCT 1) FROM tinytable;
      Query: select COUNT(DISTINCT 1) FROM tinytable
      +-------------------+
      | count(distinct 1) |
      +-------------------+
      | 1                 |
      +-------------------+
      Returned 1 row(s) in 0.12s
      
      
      [localhost:21000] > explain select COUNT(DISTINCT 1) FROM tinytable;
      Query: explain select COUNT(DISTINCT 1) FROM tinytable
      +----------------------------------------------------------+
      | Explain String                                           |
      +----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=42.00MB VCores=2 |
      |                                                          |
      | 06:AGGREGATE [MERGE FINALIZE]                            |
      | |  output: sum(count(1))                                 |
      | |                                                        |
      | 05:EXCHANGE [PARTITION=UNPARTITIONED]                    |
      | |                                                        |
      | 02:AGGREGATE [MERGE]                                     |
      | |  output: count(1)                                      |
      | |                                                        |
      | 04:AGGREGATE [MERGE]                                     |
      | |  group by: 1                                           |
      | |                                                        |
      | 03:EXCHANGE [PARTITION=HASH(1)]                          |
      | |                                                        |
      | 01:AGGREGATE                                             |
      | |  group by: 1                                           |
      | |                                                        |
      | 00:SCAN HDFS [functional.tinytable]                      |
      |    partitions=1/1 size=38B                               |
      +----------------------------------------------------------+
      Returned 20 row(s) in 0.01s
      

      Attachments

        Activity

          People

            alex.behm Alexander Behm
            caseyc casey
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: