Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24835

col function ignores drop

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.3.0
    • None
    • PySpark
    • Spark 2.3.0

      Python 3.5.3

    Description

      Not sure if this is a bug or user error, but I've noticed that accessing columns with the col function ignores a previous call to drop.

      import pyspark.sql.functions as F
      
      df = spark.createDataFrame([(1,3,5), (2, None, 7), (0, 3, 2)], ['a', 'b', 'c'])
      df.show()
      
      +---+----+---+
      |  a|   b|  c|
      +---+----+---+
      |  1|   3|  5|
      |  2|null|  7|
      |  0|   3|  2|
      +---+----+---+
      
      df = df.drop('c')
      
      # the col function is able to see the 'c' column even though it has been dropped
      df.where(F.col('c') < 6).show()
      
      +---+---+
      |  a|  b|
      +---+---+
      |  1|  3|
      |  0|  3|
      +---+---+
      
      # trying the same with brackets on the data frame fails with the expected error
      df.where(df['c'] < 6).show()
      
      Py4JJavaError: An error occurred while calling o36909.apply.
      : org.apache.spark.sql.AnalysisException: Cannot resolve column name "c" among (a, b);

      Attachments

        Activity

          People

            Unassigned Unassigned
            msouder Michael Souder
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: