Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.0, 3.5.1, 3.4.3
Description
We got a customer issue that a `MergeInto` query on Iceberg table works earlier but cannot work after upgrading to Spark 3.4.
The error looks like
```
Caused by: org.apache.spark.SparkRuntimeException: Error while decoding: org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to nullable on unresolved object
upcast(getcolumnbyordinal(0, StringType), StringType, - root class: java.lang.String).toString.
```
The source table of `MergeInto` uses `ScalaUDF`. The error happens when Spark invokes the deserializer of input encoder of the `ScalaUDF` and the deserializer is not resolved yet.
The encoders of ScalaUDF are resolved by the rule `ResolveEncodersInUDF` which will be applied at the end of analysis phase.
During rewriting `MergeInto` to `ReplaceData` query, Spark creates an `Exists` subquery and `ScalaUDF` is part of the plan of the subquery. Note that the `ScalaUDF` is already resolved by the analyzer.
Then, in `ResolveSubquery` rule which resolves the subquery, it will resolve the subquery plan if it is not resolved yet. Because the subquery containing `ScalaUDF` is resolved, the rule skips it so `ResolveEncodersInUDF` won't be applied on it. So the analyzed `ReplaceData` query contains a `ScalaUDF` with encoders unresolved that cause the error.