[SPARK-22983] Don't push filters beneath aggregates with empty grouping expressions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.1.0, 2.2.0, 2.3.0
Fix Version/s: 2.2.2, 2.3.0
Component/s: SQL
Labels:
- correctness

Target Version/s:

2.3.0

Description

The following SQL query should return zero rows, but in Spark it actually returns one row:

SELECT 1 from (
  SELECT 1 AS z,
  MIN(a.x)
  FROM (select 1 as x) a
  WHERE false
) b
where b.z != b.z

The problem stems from the `PushDownPredicate` rule: when this rule encounters a filter on top of an Aggregate operator, e.g. `Filter(Agg(...))`, it removes the original filter and adds a new filter onto Aggregate's child, e.g. `Agg(Filter(...))`. This is often okay, but the case above is a counterexample: because there is no explicit `GROUP BY`, we are implicitly computing a global aggregate over the entire table so the original filter was not acting like a `HAVING` clause filtering the number of groups: if we push this filter then it fails to actually reduce the cardinality of the Aggregate output, leading to the wrong answer.

A simple fix is to never push down filters beneath aggregates when there are no grouping expressions.

Attachments

Issue Links

links to

[Github] Pull Request #20180 (JoshRosen)

Activity

People

Assignee:: Josh Rosen

Reporter:: Josh Rosen

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Jan/18 23:41

Updated:: 08/Jan/18 08:06

Resolved:: 08/Jan/18 08:06