Details
-
Umbrella
-
Status: Resolved
-
Major
-
Resolution: Done
-
3.4.0
-
None
-
None
Description
User-defined Functions in Python consist of (pickled) Python UDFs and (Arrow-optimized) Pandas UDFs. They enable users to run arbitrary Python code on top of the Apache Sparkā¢ engine. Users only have to state "what to do"; PySpark, as a sandbox, encapsulates "how to do it".
Spark Connect Python Client (SCPC), as a client and server interface for PySpark will eventually replace the legacy API of PySpark. Supporting PySpark UDFs is essential for Spark Connect to reach parity with the PySpark legacy API.
See design doc here.
Attachments
Issue Links
- is depended upon by
-
SPARK-42393 Support for Pandas/Arrow Functions API
- Resolved
- is related to
-
SPARK-42271 Reuse UDF test cases under `pyspark.sql.tests`
- Resolved
1.
|
Minimal support for pickled Python UDFs | Resolved | Unassigned | |
2.
|
Scalar Inline Python UDF in Spark Connect | Resolved | Xinrong Meng | |
3.
|
Pandas UDF in Spark Connect | Resolved | Xinrong Meng | |
4.
|
Accept return type in DDL strings for Python Scalar UDFs in Spark Connect | Resolved | Xinrong Meng | |
5.
|
Reuse UDF test cases under `pyspark.sql.tests` | Resolved | Xinrong Meng | |
6.
|
Standardize registered pickled Python UDFs | Resolved | Xinrong Meng | |
7.
|
Python UDFs with inconsistent client and server versions | Resolved | Unassigned | |
8.
|
Support complex return types in DDL strings | Resolved | Xinrong Meng | |
9.
|
Implement `spark.catalog.registerFunction` | Resolved | Xinrong Meng | |
10.
|
Standardize __repr__ of CommonInlineUserDefinedFunction | Resolved | Xinrong Meng | |
11.
|
Make `parse_data_type` use new proto message `DDLParse` | Resolved | Ruifeng Zheng | |
12.
|
Register Java (aggregate) user-defined functions | Resolved | Xinrong Meng | |
13.
|
Enable importing `pandas_udf` from `pyspark.sql.connect.functions` | Resolved | Xinrong Meng |