[SPARK-30681] Add higher order functions API to PySpark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.1.0
Component/s: PySpark, SQL
Labels:
None

Description

As of 3.0.0 higher order functions are available in SQL and Scala, but not in PySpark, forcing Python users to invoke these through expr, selectExpr or sql.

This is error prone and not well documented. Spark should provide pyspark.sql wrappers that accept plain Python functions (of course within limits of (*Column) -> Column) as arguments.

df.select(transform("values", lambda c: trim(upper(c)))

def  increment_values(k: Column, v: Column) -> Column:
    return v + 1

df.select(transform_values("data"), increment_values)

Attachments

Issue Links

is related to

SPARK-27297 Add higher order functions to Scala API

Resolved

relates to

SPARK-30682 Add higher order functions API to SparkR

Resolved

links to

[Github] Pull Request #31062 (HyukjinKwon)

GitHub Pull Request #27406

Activity

People

Assignee:: Maciej Szymkiewicz

Reporter:: Maciej Szymkiewicz

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 30/Jan/20 13:16

Updated:: 12/Dec/22 18:10

Resolved:: 28/Feb/20 04:00