Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30681

Add higher order functions API to PySpark

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.1.0
    • Component/s: PySpark, SQL
    • Labels:
      None

      Description

      As of 3.0.0 higher order functions are available in SQL and Scala, but not in PySpark, forcing Python users to invoke these through expr, selectExpr or sql.

      This is error prone and not well documented. Spark should provide pyspark.sql wrappers that accept plain Python functions (of course within limits of (*Column) -> Column) as arguments.

      df.select(transform("values", lambda c: trim(upper(c)))
      
      def  increment_values(k: Column, v: Column) -> Column:
          return v + 1
      
      df.select(transform_values("data"), increment_values)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                zero323 Maciej Szymkiewicz
                Reporter:
                zero323 Maciej Szymkiewicz
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: