Details
-
New Feature
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
2.3.0
-
None
-
None
Description
Hi!
From PySpark, I am trying to define a custom aggregator that is accumulating state. Is it possible in Spark 2.3 ?
AFAIK, it is now possible to define a custom UDAF in PySpark since Spark 2.3 (cf How to define and use a User-Defined Aggregate Function in Spark SQL?), by calling pandas_udf with the PandasUDFType.GROUPED_AGG keyword.
However given that it is just taking a function as a parameter I don't think it is possible to carry state around during the aggregation with this function.
From Scala, I see it is possible to have stateful aggregation by either extending UserDefinedAggregateFunction or org.apache.spark.sql.expressions.Aggregator , but is there a similar thing I can do on python-side only?
If no, is this planned in a future release?
thanks!
Attachments
Issue Links
- duplicates
-
SPARK-10915 Add support for UDAFs in Python
- Closed