[SPARK-39895] pyspark drop doesn't accept *cols - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.0.3, 3.3.0, 3.2.2
Fix Version/s: 3.4.0
Component/s: PySpark
Labels:
None

Docs Text:
Support for PySpark to drop multiple "Column"
Language:
- Python

Description

Pyspark dataframe drop has following signature:

def drop(self, *cols: "ColumnOrName") -> "DataFrame":

However when we try to pass multiple Column types to drop function it raises TypeError

each col in the param list should be a string

Minimal reproducible example:
values = [("id_1", 5, 9), ("id_2", 5, 1), ("id_3", 4, 3), ("id_1", 3, 3), ("id_2", 4, 3)]
df = spark.createDataFrame(values, "id string, point int, count int")

– id: string (nullable = true)

– point: integer (nullable = true)

– count: integer (nullable = true)

df.drop(df.point, df.count)

/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py in drop(self, *cols)
2537 for col in cols:
2538 if not isinstance(col, str):
-> 2539 raise TypeError("each col in the param list should be a string")
2540 jdf = self._jdf.drop(self._jseq(cols))
2541

TypeError: each col in the param list should be a string

Attachments

Issue Links

relates to

SPARK-40087 Support multiple Column drop in R

Resolved

links to

[Github] Pull Request #37333 (santosh-d3vpl3x)

[Github] Pull Request #37335 (santosh-d3vpl3x)

Activity

People

Assignee:: Santosh Pingale

Reporter:: Santosh Pingale

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 27/Jul/22 09:49

Updated:: 12/Dec/22 18:11

Resolved:: 11/Aug/22 03:14