[SPARK-35173] Support columns batch adding in PySpark.dataframe - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.2.0
Fix Version/s: 3.3.0
Component/s: PySpark, SQL
Labels:
None

Description

Now, the pyspark can only use withColumn to do column adding a column or replacing the existing column that has the same name. The scala withColumn can adding columns at one pass. [1]

Before this added, the user can only use withColumn again and again like:

self.df.withColumn("key1", col("key1")).withColumn("key2", col("key2")).withColumn("key3", col("key3"))

After the support, you user can use the with_columns complete batch operations:

self.df.withColumn(["key1", "key2", "key3"], [col("key1"), col("key2"), col("key3")])

[1] https://github.com/apache/spark/blob/b5241c97b17a1139a4ff719bfce7f68aef094d95/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2402

Attachments

Issue Links

links to

[Github] Pull Request #32276 (Yikun)

[Github] Pull Request #32431 (Yikun)

[Github] Pull Request #35518 (HyukjinKwon)

Activity

People

Assignee:: Yikun Jiang

Reporter:: Yikun Jiang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 21/Apr/21 09:30

Updated:: 12/Dec/22 18:10

Resolved:: 15/Feb/22 00:41