XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.3.1
    • 1.6.0
    • SQL
    • None

    Description

      As a Spark user still working with RDDs, I'd like the ability to convert a DataFrame to a typed RDD.

      For example, if I've converted RDDs to DataFrames so that I could save them as Parquet or CSV files, I would like to rebuild the RDD from those files automatically rather than writing the row-to-type conversion myself.

      val rdd0 = sc.parallelize(Seq(Food("apple", 1), Food("banana", 2), Food("cherry", 3)))
      val df0 = rdd0.toDF()
      df0.save("foods.parquet")
      
      val df1 = sqlContext.load("foods.parquet")
      val rdd1 = df1.toTypedRDD[Food]()
      // rdd0 and rdd1 should have the same elements
      

      I originally submitted a smaller PR for spark-csv <https://github.com/databricks/spark-csv/pull/52>, but Reynold Xin suggested that converting a DataFrame to a typed RDD wasn't something specific to spark-csv.

      Attachments

        Activity

          People

            rayortigas Ray Ortigas
            rayortigas Ray Ortigas
            Michael Armbrust Michael Armbrust
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: