Description
The UDF needs to deserialize the UnsafeRow. When the column type is Array, the `get` method from the ColumnVector, which is used by the vectorized reader, is called, but this method is not implemented, unfortunately.
Code to reproduce the issue:
val fileName = "testfile" val str = """{ "choices": ["key1", "key2", "key3"] }""" val rdd = sc.parallelize(Seq(str)) val df = spark.read.json(rdd) df.write.mode("overwrite").parquet(s"file:///tmp/$fileName ") import org.apache.spark.sql._ import org.apache.spark.sql.functions._ spark.udf.register("acf", (rows: Seq[Row]) => Option[String](null)) spark.read.parquet(s"file:///tmp/$fileName ").select(expr("""acf(choices)""")).show