This is a followup of the this issue https://issues.apache.org/jira/browse/SPARK-33398 that fixed an exception when loading a model in Spark 3 that trained in Spark2. After incorporating this fix in my project, I ran into another issue which was introduced in the fix https://github.com/apache/spark/pull/30889/files.
While loading my random forest model which was trained in Spark 2.2, I ran into the following exception:
When I looked at the data for the model, I see the schema is using "nodedata" instead of "nodeData." Here is what my model looks like:
I'm new to spark and the training of this model predates me so I can't say whether specifying the column as "nodedata" was specific to our code or was internal spark code. But I'm suspecting it's internal spark code.
cc podongfeng, the author of the original PR to support loading spark2 models in spark3. Maybe have some insights on "nodedata" vs "nodeData"