Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31663

Grouping sets with having clause returns the wrong result

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0
    • 2.4.6, 3.0.0
    • SQL

    Description

      Grouping sets with having clause returns the wrong result when the condition of having contained conflicting naming. See the below example:

      select sum(a) as b FROM VALUES (1, 10), (2, 20) AS T(a, b) group by GROUPING SETS ((b), (a, b)) having b > 10

      The `b` in `having b > 10` should be resolved as `T.b` not `sum(a)`, so the right result should be

      +---+
      |  b|
      +---+
      |  2|
      |  2|
      +---+

      instead of an empty result.

      The root cause is similar to SPARK-31519, it's caused by we parsed HAVING as Filter(..., Agg(...)) and resolved these two parts in different rules. The CUBE and ROLLUP have the same issue.

      Other systems worked as expected, I checked PostgreSQL 9.6 and MS SQL Server 2017.

       

      For Apache Spark 2.0.2 ~ 2.3.4, the following query is tested.

      spark-sql> select sum(a) as b from t group by b grouping sets(b) having b > 10;
      Time taken: 0.194 seconds
      
      hive> select sum(a) as b from t group by b grouping sets(b) having b > 10;
      2
      Time taken: 1.605 seconds, Fetched: 1 row(s) 

      Attachments

        Activity

          People

            XuanYuan Yuanjian Li
            XuanYuan Yuanjian Li
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: