Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31663

Grouping sets with having clause returns the wrong result

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0
    • Fix Version/s: 2.4.6, 3.0.0
    • Component/s: SQL
    • Labels:

      Description

      Grouping sets with having clause returns the wrong result when the condition of having contained conflicting naming. See the below example:

      select sum(a) as b FROM VALUES (1, 10), (2, 20) AS T(a, b) group by GROUPING SETS ((b), (a, b)) having b > 10

      The `b` in `having b > 10` should be resolved as `T.b` not `sum(a)`, so the right result should be

      +---+
      |  b|
      +---+
      |  2|
      |  2|
      +---+

      instead of an empty result.

      The root cause is similar to SPARK-31519, it's caused by we parsed HAVING as Filter(..., Agg(...)) and resolved these two parts in different rules. The CUBE and ROLLUP have the same issue.

      Other systems worked as expected, I checked PostgreSQL 9.6 and MS SQL Server 2017.

       

      For Apache Spark 2.0.2 ~ 2.3.4, the following query is tested.

      spark-sql> select sum(a) as b from t group by b grouping sets(b) having b > 10;
      Time taken: 0.194 seconds
      
      hive> select sum(a) as b from t group by b grouping sets(b) having b > 10;
      2
      Time taken: 1.605 seconds, Fetched: 1 row(s) 

        Attachments

          Activity

            People

            • Assignee:
              XuanYuan Yuanjian Li
              Reporter:
              XuanYuan Yuanjian Li
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: