[SPARK-27351] Wrong outputRows estimation after AggregateEstimation with only null value column - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.1
Fix Version/s: 2.4.2, 3.0.0
Component/s: SQL
Labels:
None

Description

The upper bound of group-by columns row number is to multiply distinct counts of group-by columns. However, column with only null value will cause the output row number to be 0 which is incorrect.

Ex:
col1 (distinct: 2, rowCount 2)
col2 (distinct: 0, rowCount 2)

group by col1, col2
Actual: output rows: 0
Expected: output rows: 2

var outputRows: BigInt = agg.groupingExpressions.foldLeft(BigInt(1))(
        (res, expr) => res * childStats.attributeStats(expr.asInstanceOf[Attribute]).distinctCount)

Attachments

Issue Links

is related to

SPARK-27539 Fix inaccurate aggregate outputRows estimation with column containing null values

Resolved

links to

GitHub Pull Request #24286

GitHub Pull Request #24691

Activity

People

Assignee:: peng bo

Reporter:: peng bo

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 03/Apr/19 06:58

Updated:: 23/May/19 20:41

Resolved:: 23/May/19 20:41