Description
As a spark user who uses value classes in scala for modelling domain objects, I also would like to make use of them for datasets.
For example, I would like to use the User case class which is using a value-class for it's id as the type for a DataSet:
- the underlying primitive should be mapped to the value-class column
- function on the column (for example comparison ) should only work if defined on the value-class and use these implementation
- show() should pick up the toString method of the value-class
case class Id(value: Long) extends AnyVal { def toString: String = value.toHexString } case class User(id: Id, name: String) val ds = spark.sparkContext .parallelize(0L to 12L).map(i => (i, f"name-$i")).toDS() .withColumnRenamed("_1", "id") .withColumnRenamed("_2", "name") // mapping should work val usrs = ds.as[User] // show should use toString usrs.show() // comparison with long should throw exception, as not defined on Id usrs.col("id") > 0L
For example `.show()` should use the toString of the `Id` value class:
+---+-------+ | id| name| +---+-------+ | 0| name-0| | 1| name-1| | 2| name-2| | 3| name-3| | 4| name-4| | 5| name-5| | 6| name-6| | 7| name-7| | 8| name-8| | 9| name-9| | A|name-10| | B|name-11| | C|name-12| +---+-------+
Attachments
Issue Links
- is related to
-
SPARK-17368 Scala value classes create encoder problems and break at runtime
- Resolved
-
SPARK-19741 ClassCastException when using Dataset with type containing value types
- Resolved
- links to
(4 links to)