[SPARK-27463] Support Dataframe Cogroup via Pandas UDFs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: PySpark, SQL
Labels:
None

Description

Recent work on Pandas UDFs in Spark, has allowed for improved interoperability between Pandas and Spark. This proposal aims to extend this by introducing a new Pandas UDF type which would allow for a cogroup operation to be applied to two PySpark DataFrames.

Full details are in the google document linked below.