Description
As a Spark user still working with RDDs, I'd like the ability to convert a DataFrame to a typed RDD.
For example, if I've converted RDDs to DataFrames so that I could save them as Parquet or CSV files, I would like to rebuild the RDD from those files automatically rather than writing the row-to-type conversion myself.
val rdd0 = sc.parallelize(Seq(Food("apple", 1), Food("banana", 2), Food("cherry", 3))) val df0 = rdd0.toDF() df0.save("foods.parquet") val df1 = sqlContext.load("foods.parquet") val rdd1 = df1.toTypedRDD[Food]() // rdd0 and rdd1 should have the same elements
I originally submitted a smaller PR for spark-csv <https://github.com/databricks/spark-csv/pull/52>, but Reynold Xin suggested that converting a DataFrame to a typed RDD wasn't something specific to spark-csv.