Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.0
    • Fix Version/s: 2.0.0
    • Component/s: SparkR
    • Labels:
      None

      Description

      gapply() applies an R function on groups grouped by one or more columns of a DataFrame, and returns a DataFrame. It is like GroupedDataSet.flatMapGroups() in the Dataset API.

      Two API styles are supported:
      1.

      gd <- groupBy(df, col1, ...)
      gapply(gd, function(grouping_key, group) {}, schema)
      

      2.

      gapply(df, grouping_columns, function(grouping_key, group) {}, schema) 
      

      R function input: grouping keys value, a local data.frame of this grouped data
      R function output: local data.frame

      Schema specifies the Row format of the output of the R function. It must match the R function's output.

      Note that map-side combination (partial aggregation) is not supported, user could do map-side combination via dapply().

        Attachments

          Activity

            People

            • Assignee:
              Narine Narine Kokhlikyan
              Reporter:
              sunrui Sun Rui
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: