Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42576

Add 2nd groupBy method to Dataset

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.1
    • Connect
    • None

    Description

      Dataset is missing a groupBy method:

      /**
       * Groups the Dataset using the specified columns, so that we can run aggregation on them.
       * See [[RelationalGroupedDataset]] for all the available aggregate functions.
       *
       * This is a variant of groupBy that can only group by existing columns using column names
       * (i.e. cannot construct expressions).
       *
       * {{{
       *   // Compute the average for all numeric columns grouped by department.
       *   ds.groupBy("department").avg()
       *
       *   // Compute the max age and average salary, grouped by department and gender.
       *   ds.groupBy($"department", $"gender").agg(Map(
       *     "salary" -> "avg",
       *     "age" -> "max"
       *   ))
       * }}}
       * @group untypedrel
       * @since 3.4.0
       */
      @scala.annotation.varargs
      def groupBy(col1: String, cols: String*): RelationalGroupedDataset 

      Attachments

        Activity

          People

            amaliujia Rui Wang
            hvanhovell Herman van Hövell
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: