Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
SparkR DataFrame can be subset to get one or more columns of the dataset. The current '[' implementation does not support 'drop' when is asked for just one column. This is not consistent with the R syntax:
x[i, j, ... , drop = TRUE]
- in R, when drop is FALSE, remain as data.frame
> class(iris[, "Sepal.Width", drop=F])
[1] "data.frame" - when drop is TRUE (default), drop to be a vector
> class(iris[, "Sepal.Width", drop=T])
[1] "numeric"
> class(iris[,"Sepal.Width"])
[1] "numeric"
> df <- createDataFrame(sqlContext, iris)
- in SparkR, 'drop' argument has no impact
> class(df[,"Sepal_Width", drop=F])
[1] "DataFrame"
attr(,"package")
[1] "SparkR" - should have dropped to be a Column class instead
> class(df[,"Sepal_Width", drop=T])
[1] "DataFrame"
attr(,"package")
[1] "SparkR"
> class(df[,"Sepal_Width"])
[1] "DataFrame"
attr(,"package")
[1] "SparkR"
We should add the 'drop' support.
Attachments
Issue Links
- duplicates
-
SPARK-13436 Add parameter drop to subsetting operator [
- Resolved