[SPARK-33196] Expose filtered aggregation API - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Won't Do
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

Spark currently supports filtered aggregation but does not expose API allowing to use them when using the `spark.sql.functions` package.

It is possible to use them when writing directly SQL:

scala> val df = spark.range(100)
scala> df.registerTempTable("df")
scala> spark.sql("select count(1) as classic_cnt, count(1) FILTER (WHERE id < 50) from df").show()
+-----------+-------------------------------------------------+ 
|classic_cnt|count(1) FILTER (WHERE (id < CAST(50 AS BIGINT)))|
+-----------+-------------------------------------------------+
|        100|                                               50|
+-----------+-------------------------------------------------+

These aggregations are especially useful when filtering on overlapping datasets (where a pivot would not work):

SELECT 
 AVG(revenue) FILTER (WHERE age < 25),
 AVG(revenue) FILTER (WHERE age < 35),
 AVG(revenue) FILTER (WHERE age < 45)
FROM people;

I did not find an issue tracking this, hence I am creating this one and I will join a PR to illustrate a possible implementation.

Attachments

Issue Links

links to

[Github] Pull Request #30107 (tools4origins)

Activity

People

Assignee:: Unassigned

Reporter:: Erwan Guyomarc'h

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 20/Oct/20 17:14

Updated:: 21/Oct/20 10:00

Resolved:: 21/Oct/20 10:00