Description
As of 3.0.0 higher order functions are available in SQL and Scala, but not in PySpark, forcing Python users to invoke these through expr, selectExpr or sql.
This is error prone and not well documented. Spark should provide pyspark.sql wrappers that accept plain Python functions (of course within limits of (*Column) -> Column) as arguments.
df.select(transform("values", lambda c: trim(upper(c))) def increment_values(k: Column, v: Column) -> Column: return v + 1 df.select(transform_values("data"), increment_values)
Attachments
Issue Links
- is related to
-
SPARK-27297 Add higher order functions to Scala API
- Resolved
- relates to
-
SPARK-30682 Add higher order functions API to SparkR
- Resolved
- links to