[SPARK-12922] Implement gapply() on DataFrame in SparkR - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.6.0
Fix Version/s: 2.0.0
Component/s: SparkR
Labels:
None

Description

gapply() applies an R function on groups grouped by one or more columns of a DataFrame, and returns a DataFrame. It is like GroupedDataSet.flatMapGroups() in the Dataset API.

Two API styles are supported:
1.

gd <- groupBy(df, col1, ...)
gapply(gd, function(grouping_key, group) {}, schema)

gapply(df, grouping_columns, function(grouping_key, group) {}, schema)

R function input: grouping keys value, a local data.frame of this grouped data
R function output: local data.frame

Schema specifies the Row format of the output of the R function. It must match the R function's output.

Note that map-side combination (partial aggregation) is not supported, user could do map-side combination via dapply().

Attachments

Issue Links

links to

[Github] Pull Request #12836 (NarineK)

Activity

People

Assignee:: Narine Kokhlikyan

Reporter:: Sun Rui

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 20/Jan/16 07:57

Updated:: 28/Jun/16 17:01

Resolved:: 16/Jun/16 05:00