Description
If I take an existing DataFrame, I am permitted to use withColumn() to create a duplicate column name. I assume this should be illegal, and withColumn should be prevented from permitting this. Some functions subsequently fail due to the duplicate column names. Example:
sdfCar <- createDataFrame(sqlContext, mtcars)
sdfCar1 <- withColumn(sdfCar, "isEfficient", sdfCar$mpg<=20)
sdfCar1 <- withColumn(sdfCar1, "isEfficient", ifelse(sdfCar1$mpg == sdfCar1$mpg,1,0))
sdfCar2 <- subset(sdfCar1, select=sdfCar1$isEfficient)
- subset() command fails with message: "Reference 'isEfficient' is ambiguous"
Note: I only know if this is SparkR - it might affect other languages APIs.
Attachments
Attachments
Issue Links
- is related to
-
SPARK-12204 Implement drop method for DataFrame in SparkR
- Resolved