Details
-
Sub-task
-
Status: Resolved
-
Trivial
-
Resolution: Fixed
-
4.0.0
Description
When we have mixed schema rows error message "{actual} is not a valid external type for schema of {expected}" that don't help to understand column with problem. I suggest to add information about source column.
How to reproduce
class ErrorMsgSuite extends AnyFunSuite with SharedSparkContext { test("shouldThrowSchemaError") { val seq: Seq[Row] = Seq( Row( toBytes("0"), toBytes(""), 1L, ), Row( toBytes("0"), toBytes(""), 1L, ), ) val schema: StructType = new StructType() .add("f1", BinaryType) .add("f3", StringType) .add("f2", LongType) val df = sqlContext.createDataFrame(sqlContext.sparkContext.parallelize(seq), schema) val exception = intercept[RuntimeException] { df.show() } assert( exception.getCause.getMessage .contains("[B is not a valid external type for schema of string") ) assertResult( "[B is not a valid external type for schema of string" )(exception.getCause.getMessage) } def toBytes(x: String): Array[Byte] = x.toCharArray.map(_.toByte) }
After fix error message may contain extra info
[B is not a valid external type for schema of string at getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, f3)
Attachments
Issue Links
- links to