Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17792

L-BFGS solver for linear regression does not accept general numeric label column types

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.0.2, 2.1.0
    • ML
    • None

    Description

      There's a bug in accepting numeric types for linear regression. We cast the label to DoubleType in one spot where we use normal solver, but not for the l-bfgs solver. The following can reproduce the problem:

      import org.apache.spark.ml.feature.LabeledPoint
      import org.apache.spark.ml.linalg.{Vector, DenseVector, Vectors}
      import org.apache.spark.ml.regression.LinearRegression
      import org.apache.spark.sql.types._
      
      val df = Seq(LabeledPoint(1.0, Vectors.dense(1.0))).toDF().withColumn("weight", lit(1.0).cast(LongType))
      val lr = new LinearRegression().setSolver("l-bfgs").setWeightCol("weight")
      lr.fit(df)
      

      Attachments

        Activity

          People

            sethah Seth Hendrickson
            sethah Seth Hendrickson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: