Description
Try to apply flatMap() on Dataset column which of of type
com.A.B
Here's a schema of a dataset:
root |-- id: string (nullable = true) |-- outputs: array (nullable = true) | |-- element: string
flatMap works on RDD
ds.rdd.flatMap(_.outputs)
flatMap doesnt work on dataset and gives the following error
ds.flatMap(_.outputs)
The exception:
scala.ScalaReflectionException: class com.A.B in JavaMirror … not found
at scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:123)
at scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:22)
at line189424fbb8cd47b3b62dc41e417841c159.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$typecreator3$1.apply(<console>:51)
at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:232)
at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:232)
at org.apache.spark.sql.SQLImplicits$$typecreator9$1.apply(SQLImplicits.scala:125)
at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:232)
at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:232)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:49)
at org.apache.spark.sql.SQLImplicits.newProductSeqEncoder(SQLImplicits.scala:125)
Spoke to Michael Armbrust and he confirmed it as a Dataset bug.
There is a workaround using explode()
ds.select(explode(col("outputs")))
Attachments
Attachments
Issue Links
- is duplicated by
-
SPARK-18139 Dataset mapGroups with return typ Seq[Product] produces scala.ScalaReflectionException: object $line262.$read not found
- Resolved
-
SPARK-17890 scala.ScalaReflectionException
- Resolved
- links to