Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42444

DataFrame.drop should handle multi columns properly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • PySpark
    • None

    Description

      from pyspark.sql import Row
      df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"])
      df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, name="Bob")])
      df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()
      

      This works in 3.3

      +------+
      |height|
      +------+
      |    85|
      |    80|
      +------+
      

      but fails in 3.4

      ---------------------------------------------------------------------------
      AnalysisException                         Traceback (most recent call last)
      Cell In[1], line 4
            2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"])
            3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, name="Bob")])
      ----> 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()
      
      File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in DataFrame.drop(self, *cols)
         4911     jcols = [_to_java_column(c) for c in cols]
         4912     first_column, *remaining_columns = jcols
      -> 4913     jdf = self._jdf.drop(first_column, self._jseq(remaining_columns))
         4915 return DataFrame(jdf, self.sparkSession)
      
      File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
         1316 command = proto.CALL_COMMAND_NAME +\
         1317     self.command_header +\
         1318     args_command +\
         1319     proto.END_COMMAND_PART
         1321 answer = self.gateway_client.send_command(command)
      -> 1322 return_value = get_return_value(
         1323     answer, self.gateway_client, self.target_id, self.name)
         1325 for temp_arg in temp_args:
         1326     if hasattr(temp_arg, "_detach"):
      
      File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in capture_sql_exception.<locals>.deco(*a, **kw)
          155 converted = convert_exception(e.java_exception)
          156 if not isinstance(converted, UnknownException):
          157     # Hide where the exception came from that shows a non-Pythonic
          158     # JVM exception message.
      --> 159     raise converted from None
          160 else:
          161     raise
      
      AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, `name`].
      
      

      Attachments

        Activity

          People

            podongfeng Ruifeng Zheng
            podongfeng Ruifeng Zheng
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: