Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
3.0.1
-
None
-
None
Description
When attempting to load a RandomForestClassificationModel that was trained in Spark 2.x using Spark 3.x, an exception is raised:
... RandomForestClassificationModel.load('/path/to/my/model') File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 330, in load File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 291, in load File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 280, in load File "/usr/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__ File "/usr/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 134, in deco File "<string>", line 3, in raise_from pyspark.sql.utils.AnalysisException: No such struct field rawCount in id, prediction, impurity, impurityStats, gain, leftChild, rightChild, split;
There seems to be a schema incompatibility between the trained model data saved by Spark 2.x and the expected data for a model trained in Spark 3.x
If this issue is not resolved, users will be forced to retrain any existing random forest models they trained in Spark 2.x using Spark 3.x before they can upgrade
Attachments
Issue Links
- duplicates
-
SPARK-33398 AnalysisException when loading a PipelineModel with Spark 3
-
- Resolved
-