Description
If a user knows how to map a class to a struct type in Spark SQL, he should be able to register this mapping through sqlContext and hence SQL can figure out the schema automatically.
trait RowSerializer[T] { def dataType: StructType def serialize(obj: T): Row def deserialize(row: Row): T } sqlContext.registerUserType[T](clazz: classOf[T], serializer: classOf[RowSerializer[T]])
In sqlContext, we can maintain a class-to-serializer map and use it for conversion. The serializer class can be embedded into the metadata, so when `select` is called, we know we want to deserialize the result.
sqlContext.registerUserType(classOf[Vector], classOf[VectorRowSerializer])
val points: RDD[LabeledPoint] = ...
val features: RDD[Vector] = points.select('features).map { case Row(v: Vector) => v }
Attachments
Attachments
Issue Links
- is depended upon by
-
SPARK-3573 Dataset
- Resolved
- links to
1.
|
Support UDT in Python | Resolved | Xiangrui Meng |