Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22289

Cannot save LogisticRegressionModel with bounds on coefficients

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.2.2, 2.3.0
    • ML
    • None

    Description

      I think this was introduced in SPARK-20047.

      Trying to call save on a logistic regression model trained with bounds on its parameters throws an error. This seems to be because Spark doesn't know how to serialize the Matrix parameter.

      Model is set up like this:

          val calibrator = new LogisticRegression()
            .setFeaturesCol("uncalibrated_probability")
            .setLabelCol("label")
            .setWeightCol("weight")
            .setStandardization(false)
            .setLowerBoundsOnCoefficients(new DenseMatrix(1, 1, Array(0.0)))
            .setFamily("binomial")
            .setProbabilityCol("probability")
            .setPredictionCol("logistic_prediction")
            .setRawPredictionCol("logistic_raw_prediction")
      
      17/10/16 15:36:59 ERROR ApplicationMaster: User class threw exception: scala.NotImplementedError: The default jsonEncode only supports string and vector. org.apache.spark.ml.param.Param must override jsonEncode for org.apache.spark.ml.linalg.DenseMatrix.
      scala.NotImplementedError: The default jsonEncode only supports string and vector. org.apache.spark.ml.param.Param must override jsonEncode for org.apache.spark.ml.linalg.DenseMatrix.
      	at org.apache.spark.ml.param.Param.jsonEncode(params.scala:98)
      	at org.apache.spark.ml.util.DefaultParamsWriter$$anonfun$1$$anonfun$2.apply(ReadWrite.scala:296)
      	at org.apache.spark.ml.util.DefaultParamsWriter$$anonfun$1$$anonfun$2.apply(ReadWrite.scala:295)
      	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
      	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
      	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
      	at org.apache.spark.ml.util.DefaultParamsWriter$$anonfun$1.apply(ReadWrite.scala:295)
      	at org.apache.spark.ml.util.DefaultParamsWriter$$anonfun$1.apply(ReadWrite.scala:295)
      	at scala.Option.getOrElse(Option.scala:121)
      	at org.apache.spark.ml.util.DefaultParamsWriter$.getMetadataToSave(ReadWrite.scala:295)
      	at org.apache.spark.ml.util.DefaultParamsWriter$.saveMetadata(ReadWrite.scala:277)
      	at org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelWriter.saveImpl(LogisticRegression.scala:1182)
      	at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:114)
      	at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$saveImpl$1.apply(Pipeline.scala:254)
      	at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$saveImpl$1.apply(Pipeline.scala:253)
      	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
      	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
      	at org.apache.spark.ml.Pipeline$SharedReadWrite$.saveImpl(Pipeline.scala:253)
      	at org.apache.spark.ml.PipelineModel$PipelineModelWriter.saveImpl(Pipeline.scala:337)
      	at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:114)
      	-snip-
      

      Attachments

        Activity

          People

            yuhaoyan yuhao yang
            nseggert Nic Eggert
            Yanbo Liang Yanbo Liang
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: