Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12916

Support Row.fromSeq and Row.toSeq methods in pyspark

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: PySpark, SQL

      Description

      Pyspark should also have access to the Row functions like fromSeq and toSeq which are exposed in the scala api.
      https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Row

      This will be useful when constructing custom columns from function called in dataframes. A good example is present in the following SO threat:

      http://stackoverflow.com/questions/32196207/derive-multiple-columns-from-a-single-column-in-a-spark-dataframe

      import org.apache.spark.sql.types._
      import org.apache.spark.sql.Row
      
      def foobarFunc(x: Long, y: Double, z: String): Seq[Any] = 
        Seq(x * y, z.head.toInt * y)
      
      val schema = StructType(df.schema.fields ++
        Array(StructField("foo", DoubleType), StructField("bar", DoubleType)))
      
      val rows = df.rdd.map(r => Row.fromSeq(
        r.toSeq ++
        foobarFunc(r.getAs[Long]("x"), r.getAs[Double]("y"), r.getAs[String]("z"))))
      
      val df2 = sqlContext.createDataFrame(rows, schema)
      
      df2.show
      // +---+----+---+----+-----+
      // |  x|   y|  z| foo|  bar|
      // +---+----+---+----+-----+
      // |  1| 3.0|  a| 3.0|291.0|
      // |  2|-1.0|  b|-2.0|-98.0|
      // |  3| 0.0|  c| 0.0|  0.0|
      // +---+----+---+----+-----+
      

      I am ready to work on this feature.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              shubhanshumishra@gmail.com Shubhanshu Mishra
              Shepherd:
              Shivram Mani
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: