[SPARK-31663] Grouping sets with having clause returns the wrong result - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0
Fix Version/s: 2.4.6, 3.0.0
Component/s: SQL
Labels:
- correctness

Description

Grouping sets with having clause returns the wrong result when the condition of having contained conflicting naming. See the below example:

select sum(a) as b FROM VALUES (1, 10), (2, 20) AS T(a, b) group by GROUPING SETS ((b), (a, b)) having b > 10

The `b` in `having b > 10` should be resolved as `T.b` not `sum(a)`, so the right result should be

+---+
|  b|
+---+
|  2|
|  2|
+---+

instead of an empty result.

The root cause is similar to ~~SPARK-31519~~, it's caused by we parsed HAVING as Filter(..., Agg(...)) and resolved these two parts in different rules. The CUBE and ROLLUP have the same issue.

Other systems worked as expected, I checked PostgreSQL 9.6 and MS SQL Server 2017.

For Apache Spark 2.0.2 ~ 2.3.4, the following query is tested.

spark-sql> select sum(a) as b from t group by b grouping sets(b) having b > 10;
Time taken: 0.194 seconds

hive> select sum(a) as b from t group by b grouping sets(b) having b > 10;
2
Time taken: 1.605 seconds, Fetched: 1 row(s)

Attachments

Issue Links

links to

[Github] Pull Request #28501 (xuanyuanking)

Activity

People

Assignee:: Yuanjian Li

Reporter:: Yuanjian Li

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 08/May/20 16:24

Updated:: 01/Jun/20 17:32

Resolved:: 16/May/20 05:36