Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 3.4.0
-
None
-
ghx-label-13
Description
Consider the tpcds customer table .. its c_current_addr_sk column has #Nulls = 0 as shown below.
tpcds> show column stats customer; +------------------------+--------+------------------+--------+----------+-------------------+ | Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size | +------------------------+--------+------------------+--------+----------+-------------------+ .... | c_current_cdemo_sk | INT | 91558 | 3438 | 4 | 4 | | c_current_hdemo_sk | INT | 7376 | 3431 | 4 | 4 | | c_current_addr_sk | INT | 42003 | 0 | 4 | 4 | ....
The cardinality estimate for the following predicates shows a default selectivity of 10% being applied which is not correct:
explain select c_current_addr_sk from customer where c_current_addr_sk is not null; | 00:SCAN HDFS [tpcds.customer] | | HDFS partitions=1/1 files=1 size=12.60MB | | predicates: c_current_addr_sk IS NOT NULL | | row-size=4B cardinality=10.00K | +------------------------------------------------------------+ explain select c_current_addr_sk from customer where c_current_addr_sk is null; | 00:SCAN HDFS [tpcds.customer] | | HDFS partitions=1/1 files=1 size=12.60MB | | predicates: c_current_addr_sk IS NULL | | row-size=4B cardinality=10.00K |
Attachments
Issue Links
- is duplicated by
-
IMPALA-8050 IS [NOT] NULL gives wrong selectivity when null count is missing
- Resolved