Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39895

pyspark drop doesn't accept *cols

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.0.3, 3.3.0, 3.2.2
    • 3.4.0
    • PySpark
    • None
    • Support for PySpark to drop multiple "Column"

    Description

      Pyspark dataframe drop has following signature:

      def drop(self, *cols: "ColumnOrName") -> "DataFrame":

      However when we try to pass multiple Column types to drop function it raises TypeError

      each col in the param list should be a string

      Minimal reproducible example:
      values = [("id_1", 5, 9), ("id_2", 5, 1), ("id_3", 4, 3), ("id_1", 3, 3), ("id_2", 4, 3)]
      df = spark.createDataFrame(values, "id string, point int, count int")

      – id: string (nullable = true)
      – point: integer (nullable = true)
      – count: integer (nullable = true)

      df.drop(df.point, df.count)

      /spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py in drop(self, *cols)
      2537 for col in cols:
      2538 if not isinstance(col, str):
      -> 2539 raise TypeError("each col in the param list should be a string")
      2540 jdf = self._jdf.drop(self._jseq(cols))
      2541

      TypeError: each col in the param list should be a string

      Attachments

        Activity

          People

            santosh.pingale Santosh Pingale
            santosh.pingale Santosh Pingale
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: