Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10894

Add 'drop' support for DataFrame's subset function

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • SparkR
    • None

    Description

      SparkR DataFrame can be subset to get one or more columns of the dataset. The current '[' implementation does not support 'drop' when is asked for just one column. This is not consistent with the R syntax:
      x[i, j, ... , drop = TRUE]

      1. in R, when drop is FALSE, remain as data.frame
        > class(iris[, "Sepal.Width", drop=F])
        [1] "data.frame"
      2. when drop is TRUE (default), drop to be a vector
        > class(iris[, "Sepal.Width", drop=T])
        [1] "numeric"
        > class(iris[,"Sepal.Width"])
        [1] "numeric"

      > df <- createDataFrame(sqlContext, iris)

      1. in SparkR, 'drop' argument has no impact
        > class(df[,"Sepal_Width", drop=F])
        [1] "DataFrame"
        attr(,"package")
        [1] "SparkR"
      2. should have dropped to be a Column class instead
        > class(df[,"Sepal_Width", drop=T])
        [1] "DataFrame"
        attr(,"package")
        [1] "SparkR"
        > class(df[,"Sepal_Width"])
        [1] "DataFrame"
        attr(,"package")
        [1] "SparkR"

      We should add the 'drop' support.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              adrian555 Weiqiang Zhuang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: