Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21681

MLOR do not work correctly when featureStd contains zero

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0, 2.3.0
    • 2.2.1, 2.3.0
    • ML

    Description

      MLOR do not work correctly when featureStd contains zero.
      We can reproduce the bug through such dataset (features including zero variance), will generate wrong result (all coefficients becomes 0)

          val multinomialDatasetWithZeroVar = {
            val nPoints = 100
            val coefficients = Array(
              -0.57997, 0.912083, -0.371077,
              -0.16624, -0.84355, -0.048509)
      
            val xMean = Array(5.843, 3.0)
            val xVariance = Array(0.6856, 0.0)  // including zero variance
      
            val testData = generateMultinomialLogisticInput(
              coefficients, xMean, xVariance, addIntercept = true, nPoints, seed)
      
            val df = sc.parallelize(testData, 4).toDF().withColumn("weight", lit(1.0))
            df.cache()
            df
          }
      

      Attachments

        Activity

          People

            weichenxu123 Weichen Xu
            weichenxu123 Weichen Xu
            Joseph K. Bradley Joseph K. Bradley
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: