Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.3.0
-
None
-
Spark 2.3.0
Python 3.5.3
Description
Not sure if this is a bug or user error, but I've noticed that accessing columns with the col function ignores a previous call to drop.
import pyspark.sql.functions as F df = spark.createDataFrame([(1,3,5), (2, None, 7), (0, 3, 2)], ['a', 'b', 'c']) df.show() +---+----+---+ | a| b| c| +---+----+---+ | 1| 3| 5| | 2|null| 7| | 0| 3| 2| +---+----+---+ df = df.drop('c') # the col function is able to see the 'c' column even though it has been dropped df.where(F.col('c') < 6).show() +---+---+ | a| b| +---+---+ | 1| 3| | 0| 3| +---+---+ # trying the same with brackets on the data frame fails with the expected error df.where(df['c'] < 6).show() Py4JJavaError: An error occurred while calling o36909.apply. : org.apache.spark.sql.AnalysisException: Cannot resolve column name "c" among (a, b);