Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17668

Support representing structs with case classes and tuples in spark sql udf inputs

    Details

    • Type: New Feature
    • Status: In Progress
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.0.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      after having gotten used to have case classes represent complex structures in Datasets, i am surprised to find out that when i work in DataFrames with udfs no such magic exists, and i have to fall back to manipulating Row objects, which is error prone and somewhat ugly.

      for example:

      case class Person(name: String, age: Int)
      
      val df = Seq((Person("john", 33), 5), (Person("mike", 30), 6)).toDF("person", "id")
      val df1 = df.withColumn("person", udf({ (p: Person) => p.copy(age = p.age + 1) }).apply(col("person")))
      df1.printSchema
      df1.show
      

      leads to:

      java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to Person
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                koert koert kuipers
              • Votes:
                1 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated: