Description
Recent work on Pandas UDFs in Spark, has allowed for improved interoperability between Pandas and Spark. This proposal aims to extend this by introducing a new Pandas UDF type which would allow for a cogroup operation to be applied to two PySpark DataFrames.
Full details are in the google document linked below.
Attachments
Issue Links
- is related to
-
SPARK-29317 Avoid inheritance hierarchy in pandas CoGroup arrow runner and its plan
- Resolved
- relates to
-
SPARK-34319 Self-join after cogroup applyInPandas fails due to unresolved conflicting attributes
- Resolved
- links to
(2 links to)