Description
I would like to chain custom transformations as is suggested in this blog post
This will allow to write something like the following:
def with_greeting(df): return df.withColumn("greeting", lit("hi")) def with_something(df, something): return df.withColumn("something", lit(something)) data = [("jose", 1), ("li", 2), ("liz", 3)] source_df = spark.createDataFrame(data, ["name", "age"]) actual_df = (source_df .transform(with_greeting) .transform(lambda df: with_something(df, "crazy"))) print(actual_df.show()) +----+---+--------+---------+ |name|age|greeting|something| +----+---+--------+---------+ |jose| 1| hi| crazy| | li| 2| hi| crazy| | liz| 3| hi| crazy| +----+---+--------+---------+
The only thing needed to accomplish this is the following simple method for DataFrame:
from pyspark.sql.dataframe import DataFrame def transform(self, f): return f(self) DataFrame.transform = transform
I volunteer to do the pull request if approved (at least the python part)
Attachments
Issue Links
- is duplicated by
-
SPARK-30670 Pipes for PySpark
- Resolved
- links to