Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0
Description
Grouping sets with having clause returns the wrong result when the condition of having contained conflicting naming. See the below example:
select sum(a) as b FROM VALUES (1, 10), (2, 20) AS T(a, b) group by GROUPING SETS ((b), (a, b)) having b > 10
The `b` in `having b > 10` should be resolved as `T.b` not `sum(a)`, so the right result should be
+---+ | b| +---+ | 2| | 2| +---+
instead of an empty result.
The root cause is similar to SPARK-31519, it's caused by we parsed HAVING as Filter(..., Agg(...)) and resolved these two parts in different rules. The CUBE and ROLLUP have the same issue.
Other systems worked as expected, I checked PostgreSQL 9.6 and MS SQL Server 2017.
For Apache Spark 2.0.2 ~ 2.3.4, the following query is tested.
spark-sql> select sum(a) as b from t group by b grouping sets(b) having b > 10; Time taken: 0.194 seconds hive> select sum(a) as b from t group by b grouping sets(b) having b > 10; 2 Time taken: 1.605 seconds, Fetched: 1 row(s)