Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1165

Predicate dropped: Inline view + DISTINCT aggregate in outer query

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.0
    • Impala 2.0
    • None
    • None

    Description

      Please see the queries below. There seems to be a special combination that triggers a bug where the predicate is dropped.

      The "where foo = 10" is not applied

      [localhost:21000] > select foo, sum(distinct foo) from (select int_col + int_col as foo from alltypesagg) t where foo = 10 group by foo limit 10;
      Query: select foo, sum(distinct foo) from (select int_col + int_col as foo from alltypesagg) t where foo = 10 group by foo limit 10
      +------+-------------------+
      | foo  | sum(distinct foo) |
      +------+-------------------+
      | 386  | 386               |
      | 642  | 642               |
      | 592  | 592               |
      | 976  | 976               |
      | 884  | 884               |
      | 934  | 934               |
      | 678  | 678               |
      | 422  | 422               |
      | 622  | 622               |
      | 1006 | 1006              |
      +------+-------------------+
      Fetched 10 row(s) in 0.12s
      
      
      [localhost:21000] > explain select foo, sum(distinct foo) from (select int_col + int_col as foo from alltypesagg) t where foo = 10 group by foo limit 10;
      Query: explain select foo, sum(distinct foo) from (select int_col + int_col as foo from alltypesagg) t where foo = 10 group by foo limit 10
      +----------------------------------------------------------+
      | Explain String                                           |
      +----------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=90.00MB VCores=2 |
      |                                                          |
      | 05:EXCHANGE [UNPARTITIONED]                              |
      | |  limit: 10                                             |
      | |                                                        |
      | 02:AGGREGATE [MERGE FINALIZE]                            |
      | |  output: sum(foo)                                      |
      | |  group by: foo                                         |
      | |  limit: 10                                             |
      | |                                                        |
      | 04:AGGREGATE [MERGE]                                     |
      | |  group by: foo, foo                                    |
      | |                                                        |
      | 03:EXCHANGE [HASH(foo)]                                  |
      | |                                                        |
      | 01:AGGREGATE                                             |
      | |  group by: int_col + int_col, int_col + int_col        |
      | |                                                        |
      | 00:SCAN HDFS [functional.alltypesagg]                    |
      |    partitions=11/11 size=814.73KB                        |
      +----------------------------------------------------------+
      Fetched 20 row(s) in 0.02s
      

      Changing the expr in the inline view from "int_col + int_col" to just "int_col" fixes the problem...

      [localhost:21000] > select foo, sum(distinct foo) from (select int_col as foo from alltypesagg) t where foo = 10 group by foo limit 10;
      Query: select foo, sum(distinct foo) from (select int_col as foo from alltypesagg) t where foo = 10 group by foo limit 10
      +-----+-------------------+
      | foo | sum(distinct foo) |
      +-----+-------------------+
      | 10  | 10                |
      +-----+-------------------+
      Fetched 1 row(s) in 0.12s
      

      also removing the distinct fixes the problem

      [localhost:21000] > select foo, sum(foo) from (select int_col + int_col as foo from alltypesagg) t where foo = 10 group by foo limit 10;
      Query: select foo, sum(foo) from (select int_col + int_col as foo from alltypesagg) t where foo = 10 group by foo limit 10
      +-----+----------+
      | foo | sum(foo) |
      +-----+----------+
      | 10  | 100      |
      +-----+----------+
      Fetched 1 row(s) in 0.13s
      

      also the equivalent query without the inline view does not have the problem

      [localhost:21000] > select int_col + int_col, sum(distinct int_col + int_col) from alltypesagg where int_col + int_col = 10 group by int_col + int_col limit 10;
      Query: select int_col + int_col, sum(distinct int_col + int_col) from alltypesagg where int_col + int_col = 10 group by int_col + int_col limit 10
      +-------------------+---------------------------------+
      | int_col + int_col | sum(distinct int_col + int_col) |
      +-------------------+---------------------------------+
      | 10                | 10                              |
      +-------------------+---------------------------------+
      Fetched 1 row(s) in 0.13s
      

      Attachments

        Activity

          People

            dtsirogiannis Dimitris Tsirogiannis
            caseyc casey
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: