[SPARK-24835] col function ignores drop - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: PySpark
Labels:
- bulk-closed
Environment:

Spark 2.3.0

Python 3.5.3

Description

Not sure if this is a bug or user error, but I've noticed that accessing columns with the col function ignores a previous call to drop.

import pyspark.sql.functions as F

df = spark.createDataFrame([(1,3,5), (2, None, 7), (0, 3, 2)], ['a', 'b', 'c'])
df.show()

+---+----+---+
|  a|   b|  c|
+---+----+---+
|  1|   3|  5|
|  2|null|  7|
|  0|   3|  2|
+---+----+---+

df = df.drop('c')

# the col function is able to see the 'c' column even though it has been dropped
df.where(F.col('c') < 6).show()

+---+---+
|  a|  b|
+---+---+
|  1|  3|
|  0|  3|
+---+---+

# trying the same with brackets on the data frame fails with the expected error
df.where(df['c'] < 6).show()

Py4JJavaError: An error occurred while calling o36909.apply.
: org.apache.spark.sql.AnalysisException: Cannot resolve column name "c" among (a, b);

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Michael Souder

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 17/Jul/18 19:04

Updated:: 08/Oct/19 05:41

Resolved:: 08/Oct/19 05:41