Description
This issue is follow up of https://github.com/apache/spark/pull/24286. As smilegator pointed out that column with null value is inaccurate as well.
> select key from test;
2
NULL
1
spark-sql> desc extended test key;
col_name key
data_type int
comment NULL
min 1
max 2
num_nulls 1
distinct_count 2
The distinct count should be distinct_count + 1 when the column contains null value.
Attachments
Issue Links
- relates to
-
SPARK-27351 Wrong outputRows estimation after AggregateEstimation with only null value column
- Resolved
- links to