[SPARK-27539] Fix inaccurate aggregate outputRows estimation with column containing null values - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 2.4.3, 3.0.0
Component/s: SQL
Labels:
None

Description

This issue is follow up of https://github.com/apache/spark/pull/24286. As smilegator pointed out that column with null value is inaccurate as well.

> select key from test;
2
NULL
1
spark-sql> desc extended test key;
col_name key
data_type int
comment NULL
min 1
max 2
num_nulls 1
distinct_count 2

The distinct count should be distinct_count + 1 when the column contains null value.

Attachments

Issue Links

relates to

SPARK-27351 Wrong outputRows estimation after AggregateEstimation with only null value column

Resolved

links to

GitHub Pull Request #24436

Activity

People

Assignee:: peng bo

Reporter:: peng bo

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 22/Apr/19 08:16

Updated:: 19/Aug/19 03:54

Resolved:: 23/May/19 20:42