Details
Description
Pyspark dataframe drop has following signature:
def drop(self, *cols: "ColumnOrName") -> "DataFrame":
However when we try to pass multiple Column types to drop function it raises TypeError
each col in the param list should be a string
Minimal reproducible example:
values = [("id_1", 5, 9), ("id_2", 5, 1), ("id_3", 4, 3), ("id_1", 3, 3), ("id_2", 4, 3)]
df = spark.createDataFrame(values, "id string, point int, count int")
– id: string (nullable = true) |
– point: integer (nullable = true) |
– count: integer (nullable = true) |
df.drop(df.point, df.count)
/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py in drop(self, *cols)
2537 for col in cols:
2538 if not isinstance(col, str):
-> 2539 raise TypeError("each col in the param list should be a string")
2540 jdf = self._jdf.drop(self._jseq(cols))
2541TypeError: each col in the param list should be a string
Attachments
Issue Links
- relates to
-
SPARK-40087 Support multiple Column drop in R
- Resolved
- links to