Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30681

Add higher order functions API to PySpark

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.1.0
    • PySpark, SQL
    • None

    Description

      As of 3.0.0 higher order functions are available in SQL and Scala, but not in PySpark, forcing Python users to invoke these through expr, selectExpr or sql.

      This is error prone and not well documented. Spark should provide pyspark.sql wrappers that accept plain Python functions (of course within limits of (*Column) -> Column) as arguments.

      df.select(transform("values", lambda c: trim(upper(c)))
      
      def  increment_values(k: Column, v: Column) -> Column:
          return v + 1
      
      df.select(transform_values("data"), increment_values)
      

      Attachments

        Issue Links

          Activity

            People

              zero323 Maciej Szymkiewicz
              zero323 Maciej Szymkiewicz
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: