Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.4.1, 3.5.0
Description
The following action fails on 3.4.1, 3.5.0, and master:
scala> val df = Seq(Seq(Some(Seq(0)))).toDF("a") val df = Seq(Seq(Some(Seq(0)))).toDF("a") org.apache.spark.SparkRuntimeException: [EXPRESSION_ENCODING_FAILED] Failed to encode a value of the expressions: mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -1), mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -2), assertnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -2), IntegerType, IntegerType)), unwrapoption(ObjectType(interface scala.collection.immutable.Seq), validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -1), ArrayType(IntegerType,false), ObjectType(class scala.Option))), None), input[0, scala.collection.immutable.Seq, true], None) AS value#0 to a row. SQLSTATE: 42846 ... Caused by: java.lang.RuntimeException: scala.Some is not a valid external type for schema of array<int> at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_0$(Unknown Source) ...
However, it succeeds on 3.3.3:
scala> val df = Seq(Seq(Some(Seq(0)))).toDF("a") df: org.apache.spark.sql.DataFrame = [a: array<array<int>>] scala> df.collect res0: Array[org.apache.spark.sql.Row] = Array([WrappedArray(WrappedArray(0))])
Map of Option[Seq] also fails on 3.4.1, 3.5.0, and master:
scala> val df = Seq(Map(0 -> Some(Seq(0)))).toDF("a") val df = Seq(Map(0 -> Some(Seq(0)))).toDF("a") org.apache.spark.SparkRuntimeException: [EXPRESSION_ENCODING_FAILED] Failed to encode a value of the expressions: externalmaptocatalyst(lambdavariable(ExternalMapToCatalyst_key, ObjectType(class java.lang.Object), false, -1), assertnotnull(validateexternaltype(lambdavariable(ExternalMapToCatalyst_key, ObjectType(class java.lang.Object), false, -1), IntegerType, IntegerType)), lambdavariable(ExternalMapToCatalyst_value, ObjectType(class java.lang.Object), true, -2), mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -3), assertnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -3), IntegerType, IntegerType)), unwrapoption(ObjectType(interface scala.collection.immutable.Seq), validateexternaltype(lambdavariable(ExternalMapToCatalyst_value, ObjectType(class java.lang.Object), true, -2), ArrayType(IntegerType,false), ObjectType(class scala.Option))), None), input[0, scala.collection.immutable.Map, true]) AS value#0 to a row. SQLSTATE: 42846 ... Caused by: java.lang.RuntimeException: scala.Some is not a valid external type for schema of array<int> at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_0$(Unknown Source) ...
As with the first example, this succeeds on 3.3.3:
scala> val df = Seq(Map(0 -> Some(Seq(0)))).toDF("a") df: org.apache.spark.sql.DataFrame = [a: map<int,array<int>>] scala> df.collect res0: Array[org.apache.spark.sql.Row] = Array([Map(0 -> WrappedArray(0))])
Other cases the fail on 3.4.1, 3.5.0, and master but work fine on 3.3.3:
- Seq[Option[Timestamp]]
- Map[Option[Timestamp]]
- Seq[Option[Date]]
- Map[Option[Date]]
- Seq[Option[BigDecimal]]
- Map[Option[BigDecimal]]
However, the following work fine on 3.3.3, 3.4.1, 3.5.0, and master:
- Seq[Option[Map]]
- Map[Option[Map]]
- Seq[Option[<primitive-type>]]
- Map[Option[<primitive-type>]]
Attachments
Issue Links
- is duplicated by
-
SPARK-45644 After upgrading to Spark 3.4.1 and 3.5.0 we receive RuntimeException "scala.Some is not a valid external type for schema of array<string>"
- Resolved
- links to