Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
Description
In the below example, the case expression only has two possible values but the planner estimates 7300 distinct values. This is a fairly common pattern in real-world queries and we could get a better estimate by analysing the case expression.
[localhost:21000] > set explain_level=2; EXPLAIN_LEVEL set to 2 [localhost:21000] > explain select distinct case when id = 1 then 'yes' else 'no' end from functional.alltypes; Query: explain select distinct case when id = 1 then 'yes' else 'no' end from functional.alltypes +---------------------------------------------------------------+ | Explain String | +---------------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=180.00MB VCores=2 | | | | PLAN-ROOT SINK | | | | | 04:EXCHANGE [UNPARTITIONED] | | | hosts=3 per-host-mem=unavailable | | | tuple-ids=1 row-size=16B cardinality=7300 | | | | | 03:AGGREGATE [FINALIZE] | | | group by: CASE WHEN id = 1 THEN 'yes' ELSE 'no' END | | | hosts=3 per-host-mem=10.00MB | | | tuple-ids=1 row-size=16B cardinality=7300 | | | | | 02:EXCHANGE [HASH(CASE WHEN id = 1 THEN 'yes' ELSE 'no' END)] | | | hosts=3 per-host-mem=0B | | | tuple-ids=1 row-size=16B cardinality=7300 | | | | | 01:AGGREGATE [STREAMING] | | | group by: CASE WHEN id = 1 THEN 'yes' ELSE 'no' END | | | hosts=3 per-host-mem=10.00MB | | | tuple-ids=1 row-size=16B cardinality=7300 | | | | | 00:SCAN HDFS [functional.alltypes, RANDOM] | | partitions=24/24 files=24 size=478.45KB | | table stats: 7300 rows total | | column stats: all | | hosts=3 per-host-mem=160.00MB | | tuple-ids=0 row-size=4B cardinality=7300 | +---------------------------------------------------------------+ Fetched 28 row(s) in 0.01s [localhost:21000] >