Details
Description
When calling the .drop method using a string on a dataframe that contains a column name with a period in it, an AnalysisException is raised. This doesn't happen when dropping using the column object itself.
>>> import json >>> ds = {'a': "test", "b.no": "testagain"} >>> df = sqlContext.jsonRDD(sc.parallelize([json.dumps(ds)])) >>> df.drop('a')
yields
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/Cellar/apache-spark/1.6.0/libexec/python/pyspark/sql/dataframe.py", line 1347, in drop jdf = self._jdf.drop(col) File "/usr/local/Cellar/apache-spark/1.6.0/libexec/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ File "/usr/local/Cellar/apache-spark/1.6.0/libexec/python/pyspark/sql/utils.py", line 51, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException: u"cannot resolve 'b.no' given input columns a, b.no;"
whereas this works,
>>> df.drop(df.a) DataFrame[b.no: string]
current workaround if you want to drop a column using a string is to use
>>> df.drop(df.select("a")[0])
DataFrame[b.no: string]
Attachments
Issue Links
- duplicates
-
SPARK-12988 Can't drop columns that contain dots
- Resolved