Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33661

Unable to load RandomForestClassificationModel trained in Spark 2.x

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 3.0.1
    • Fix Version/s: None
    • Component/s: ML
    • Labels:
      None

      Description

      When attempting to load a RandomForestClassificationModel that was trained in Spark 2.x using Spark 3.x, an exception is raised:

      ...
          RandomForestClassificationModel.load('/path/to/my/model')
        File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 330, in load
        File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 291, in load
        File "/usr/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 280, in load
        File "/usr/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
        File "/usr/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 134, in deco
        File "<string>", line 3, in raise_from
      pyspark.sql.utils.AnalysisException: No such struct field rawCount in id, prediction, impurity, impurityStats, gain, leftChild, rightChild, split;
      

      There seems to be a schema incompatibility between the trained model data saved by Spark 2.x and the expected data for a model trained in Spark 3.x

      If this issue is not resolved, users will be forced to retrain any existing random forest models they trained in Spark 2.x using Spark 3.x before they can upgrade

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                marcusian Marcus Levine
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: