[SPARK-42576] Add 2nd groupBy method to Dataset - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.4.0
Fix Version/s: 3.4.1
Component/s: Connect
Labels:
None

Epic Link:
Spark Connect Scala Client - Basic

Description

Dataset is missing a groupBy method:

/**
 * Groups the Dataset using the specified columns, so that we can run aggregation on them.
 * See [[RelationalGroupedDataset]] for all the available aggregate functions.
 *
 * This is a variant of groupBy that can only group by existing columns using column names
 * (i.e. cannot construct expressions).
 *
 * {{{
 *   // Compute the average for all numeric columns grouped by department.
 *   ds.groupBy("department").avg()
 *
 *   // Compute the max age and average salary, grouped by department and gender.
 *   ds.groupBy($"department", $"gender").agg(Map(
 *     "salary" -> "avg",
 *     "age" -> "max"
 *   ))
 * }}}
 * @group untypedrel
 * @since 3.4.0
 */
@scala.annotation.varargs
def groupBy(col1: String, cols: String*): RelationalGroupedDataset

Attachments

Issue Links

links to

[Github] Pull Request #40173 (amaliujia)

Activity

People

Assignee:: Rui Wang

Reporter:: Herman van Hövell

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 25/Feb/23 02:35

Updated:: 26/Feb/23 02:36

Resolved:: 26/Feb/23 02:36