Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6457

Error when calling Pyspark RandomForestModel.load

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.3.0
    • 1.3.1, 1.4.0
    • MLlib, PySpark
    • None

    Description

      Reported by https://github.com/catmonkeylee:

      Summary: PySpark RandomForestModel.load fails in test script. It appears that the saved model file is empty.

      When I run the sample code in cluster mode, there is an error.

      Traceback (most recent call last):
      File "/data1/s/apps/spark-app/app/sample_rf.py", line 25, in
      sameModel = RandomForestModel.load(sc, model_path)
      File "/home/s/apps/spark/python/pyspark/mllib/util.py", line 254, in load
      java_model = cls.load_java(sc, path)
      File "/home/s/apps/spark/python/pyspark/mllib/util.py", line 250, in _load_java
      return java_obj.load(sc._jsc.sc(), path)
      File "/home/s/apps/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call
      File "/home/s/apps/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
      py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.mllib.tree.model.RandomForestModel.load.
      : java.lang.UnsupportedOperationException: empty collection
      at org.apache.spark.rdd.RDD.first(RDD.scala:1191)
      at org.apache.spark.mllib.util.Loader$.loadMetadata(modelSaveLoad.scala:125)
      at org.apache.spark.mllib.tree.model.RandomForestModel$.load(treeEnsembleModels.scala:65)
      at org.apache.spark.mllib.tree.model.RandomForestModel.load(treeEnsembleModels.scala)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

      I run the code on a spark cluster , spark version is 1.3.0

      The test code:
      ===================================
      from pyspark import SparkContext, SparkConf
      from pyspark.mllib.tree import RandomForest, RandomForestModel
      from pyspark.mllib.util import MLUtils

      conf = SparkConf().setAppName('LocalTest')
      sc = SparkContext(conf=conf)
      data = MLUtils.loadLibSVMFile(sc, 'data/mllib/sample_libsvm_data.txt')
      print data.count()
      (trainingData, testData) = data.randomSplit([0.7, 0.3])
      model = RandomForest.trainClassifier(trainingData, numClasses=2, categoricalFeaturesInfo={},
      numTrees=3, featureSubsetStrategy="auto",
      impurity='gini', maxDepth=4, maxBins=32)

      1. Evaluate model on test instances and compute test error
        predictions = model.predict(testData.map(lambda x: x.features))
        labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions)
        testErr = labelsAndPredictions.filter(lambda (v, p): v != p).count() / float(testData.count())
        print('Test Error = ' + str(testErr))
        print('Learned classification forest model:')
        print(model.toDebugString())
      1. Save and load model
        _model_path = "/home/s/apps/spark-app/data/myModelPath"
        model.save(sc, _model_path)
        sameModel = RandomForestModel.load(sc, _model_path)
        sc.stop()

      ===================
      run command:
      spark-submit --master spark://t0.q.net:7077 --executor-memory 1G sample_rf.py

      ======================
      Then I get this error :

      Traceback (most recent call last):
      File "/data1/s/apps/spark-app/app/sample_rf.py", line 25, in <module>
      sameModel = RandomForestModel.load(sc, _model_path)
      File "/home/s/apps/spark/python/pyspark/mllib/util.py", line 254, in load
      java_model = cls._load_java(sc, path)
      File "/home/s/apps/spark/python/pyspark/mllib/util.py", line 250, in _load_java
      return java_obj.load(sc._jsc.sc(), path)
      File "/home/s/apps/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in _call_
      File "/home/s/apps/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
      py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.mllib.tree.model.RandomForestModel.load.
      : java.lang.UnsupportedOperationException: empty collection
      at org.apache.spark.rdd.RDD.first(RDD.scala:1191)
      at org.apache.spark.mllib.util.Loader$.loadMetadata(modelSaveLoad.scala:125)
      at org.apache.spark.mllib.tree.model.RandomForestModel$.load(treeEnsembleModels.scala:65)
      at org.apache.spark.mllib.tree.model.RandomForestModel.load(treeEnsembleModels.scala)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
      at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
      at py4j.Gateway.invoke(Gateway.java:259)
      at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
      at py4j.commands.CallCommand.execute(CallCommand.java:79)
      at py4j.GatewayConnection.run(GatewayConnection.java:207)
      at java.lang.Thread.run(Thread.java:724)

      Attachments

        Issue Links

          Activity

            People

              josephkb Joseph K. Bradley
              josephkb Joseph K. Bradley
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: