Description
I am trying to define a UserDefinedType based on String but different from StringType in Spark 2.4.1 but it looks like there is a bug in Spark or I am doing smth incorrectly.
I define my type as follows:
class MyType extends UserDefinedType[MyValue] { override def sqlType: DataType = StringType ... } @SQLUserDefinedType(udt = classOf[MyType]) case class MyValue
I expect it to be read and stored as String with just a custom SQL type. In fact Spark can't read the string at all:
java.lang.ClassCastException: org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$makeConverter$11 cannot be cast to org.apache.spark.unsafe.types.UTF8String at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getUTF8String(rows.scala:195) at org.apache.spark.sql.catalyst.expressions.JoinedRow.getUTF8String(JoinedRow.scala:102)
the problem is with UnivocityParser.makeConverter that doesn't return (String => Any) function but (String => (String => Any)) in the case of UDT, see UnivocityParser:184
case udt: UserDefinedType[_] => (datum: String) => makeConverter(name, udt.sqlType, nullable, options)
Attachments
Attachments
Issue Links
- links to