Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.0
-
None
-
None
Description
Please see the queries below. There seems to be a special combination that triggers a bug where the predicate is dropped.
The "where foo = 10" is not applied
[localhost:21000] > select foo, sum(distinct foo) from (select int_col + int_col as foo from alltypesagg) t where foo = 10 group by foo limit 10; Query: select foo, sum(distinct foo) from (select int_col + int_col as foo from alltypesagg) t where foo = 10 group by foo limit 10 +------+-------------------+ | foo | sum(distinct foo) | +------+-------------------+ | 386 | 386 | | 642 | 642 | | 592 | 592 | | 976 | 976 | | 884 | 884 | | 934 | 934 | | 678 | 678 | | 422 | 422 | | 622 | 622 | | 1006 | 1006 | +------+-------------------+ Fetched 10 row(s) in 0.12s [localhost:21000] > explain select foo, sum(distinct foo) from (select int_col + int_col as foo from alltypesagg) t where foo = 10 group by foo limit 10; Query: explain select foo, sum(distinct foo) from (select int_col + int_col as foo from alltypesagg) t where foo = 10 group by foo limit 10 +----------------------------------------------------------+ | Explain String | +----------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=90.00MB VCores=2 | | | | 05:EXCHANGE [UNPARTITIONED] | | | limit: 10 | | | | | 02:AGGREGATE [MERGE FINALIZE] | | | output: sum(foo) | | | group by: foo | | | limit: 10 | | | | | 04:AGGREGATE [MERGE] | | | group by: foo, foo | | | | | 03:EXCHANGE [HASH(foo)] | | | | | 01:AGGREGATE | | | group by: int_col + int_col, int_col + int_col | | | | | 00:SCAN HDFS [functional.alltypesagg] | | partitions=11/11 size=814.73KB | +----------------------------------------------------------+ Fetched 20 row(s) in 0.02s
Changing the expr in the inline view from "int_col + int_col" to just "int_col" fixes the problem...
[localhost:21000] > select foo, sum(distinct foo) from (select int_col as foo from alltypesagg) t where foo = 10 group by foo limit 10; Query: select foo, sum(distinct foo) from (select int_col as foo from alltypesagg) t where foo = 10 group by foo limit 10 +-----+-------------------+ | foo | sum(distinct foo) | +-----+-------------------+ | 10 | 10 | +-----+-------------------+ Fetched 1 row(s) in 0.12s
also removing the distinct fixes the problem
[localhost:21000] > select foo, sum(foo) from (select int_col + int_col as foo from alltypesagg) t where foo = 10 group by foo limit 10; Query: select foo, sum(foo) from (select int_col + int_col as foo from alltypesagg) t where foo = 10 group by foo limit 10 +-----+----------+ | foo | sum(foo) | +-----+----------+ | 10 | 100 | +-----+----------+ Fetched 1 row(s) in 0.13s
also the equivalent query without the inline view does not have the problem
[localhost:21000] > select int_col + int_col, sum(distinct int_col + int_col) from alltypesagg where int_col + int_col = 10 group by int_col + int_col limit 10; Query: select int_col + int_col, sum(distinct int_col + int_col) from alltypesagg where int_col + int_col = 10 group by int_col + int_col limit 10 +-------------------+---------------------------------+ | int_col + int_col | sum(distinct int_col + int_col) | +-------------------+---------------------------------+ | 10 | 10 | +-------------------+---------------------------------+ Fetched 1 row(s) in 0.13s