Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22289

Cannot save LogisticRegressionModel with bounds on coefficients

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.2.2, 2.3.0
    • Component/s: ML
    • Labels:
      None

      Description

      I think this was introduced in SPARK-20047.

      Trying to call save on a logistic regression model trained with bounds on its parameters throws an error. This seems to be because Spark doesn't know how to serialize the Matrix parameter.

      Model is set up like this:

          val calibrator = new LogisticRegression()
            .setFeaturesCol("uncalibrated_probability")
            .setLabelCol("label")
            .setWeightCol("weight")
            .setStandardization(false)
            .setLowerBoundsOnCoefficients(new DenseMatrix(1, 1, Array(0.0)))
            .setFamily("binomial")
            .setProbabilityCol("probability")
            .setPredictionCol("logistic_prediction")
            .setRawPredictionCol("logistic_raw_prediction")
      
      17/10/16 15:36:59 ERROR ApplicationMaster: User class threw exception: scala.NotImplementedError: The default jsonEncode only supports string and vector. org.apache.spark.ml.param.Param must override jsonEncode for org.apache.spark.ml.linalg.DenseMatrix.
      scala.NotImplementedError: The default jsonEncode only supports string and vector. org.apache.spark.ml.param.Param must override jsonEncode for org.apache.spark.ml.linalg.DenseMatrix.
      	at org.apache.spark.ml.param.Param.jsonEncode(params.scala:98)
      	at org.apache.spark.ml.util.DefaultParamsWriter$$anonfun$1$$anonfun$2.apply(ReadWrite.scala:296)
      	at org.apache.spark.ml.util.DefaultParamsWriter$$anonfun$1$$anonfun$2.apply(ReadWrite.scala:295)
      	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
      	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
      	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
      	at org.apache.spark.ml.util.DefaultParamsWriter$$anonfun$1.apply(ReadWrite.scala:295)
      	at org.apache.spark.ml.util.DefaultParamsWriter$$anonfun$1.apply(ReadWrite.scala:295)
      	at scala.Option.getOrElse(Option.scala:121)
      	at org.apache.spark.ml.util.DefaultParamsWriter$.getMetadataToSave(ReadWrite.scala:295)
      	at org.apache.spark.ml.util.DefaultParamsWriter$.saveMetadata(ReadWrite.scala:277)
      	at org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelWriter.saveImpl(LogisticRegression.scala:1182)
      	at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:114)
      	at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$saveImpl$1.apply(Pipeline.scala:254)
      	at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$saveImpl$1.apply(Pipeline.scala:253)
      	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
      	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
      	at org.apache.spark.ml.Pipeline$SharedReadWrite$.saveImpl(Pipeline.scala:253)
      	at org.apache.spark.ml.PipelineModel$PipelineModelWriter.saveImpl(Pipeline.scala:337)
      	at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:114)
      	-snip-
      

        Attachments

          Activity

            People

            • Assignee:
              yuhaoyan yuhao yang
              Reporter:
              nseggert Nic Eggert
              Shepherd:
              Yanbo Liang
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: