Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • SQL
    • None

    Description

      Spark SQL cannot supports a SQL with nested aggregate as below:

      SELECT sum(salary), row_number() OVER (ORDER BY depname), sum(
       sum(salary) FILTER (WHERE enroll_date > '2007-01-01')
      ) FILTER (WHERE depname <> 'sales') OVER (ORDER BY depname DESC) AS "filtered_sum",
       depname
      FROM empsalary GROUP BY depname;

      And Spark will throw exception as follows:

      org.apache.spark.sql.AnalysisException
      It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.

      But PostgreSQL supports this syntax.

      SELECT sum(salary), row_number() OVER (ORDER BY depname), sum(
       sum(salary) FILTER (WHERE enroll_date > '2007-01-01')
      ) FILTER (WHERE depname <> 'sales') OVER (ORDER BY depname DESC) AS "filtered_sum",
       depname
      FROM empsalary GROUP BY depname;
       sum | row_number | filtered_sum | depname 
      -------+------------+--------------+-----------
       25100 | 1 | 22600 | develop
       7400 | 2 | 3500 | personnel
       14600 | 3 | | sales
      (3 rows)

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            beliefer Jiaan Geng
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: