Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21681

MLOR do not work correctly when featureStd contains zero

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0, 2.3.0
    • Fix Version/s: 2.2.1, 2.3.0
    • Component/s: ML
    • Labels:

      Description

      MLOR do not work correctly when featureStd contains zero.
      We can reproduce the bug through such dataset (features including zero variance), will generate wrong result (all coefficients becomes 0)

          val multinomialDatasetWithZeroVar = {
            val nPoints = 100
            val coefficients = Array(
              -0.57997, 0.912083, -0.371077,
              -0.16624, -0.84355, -0.048509)
      
            val xMean = Array(5.843, 3.0)
            val xVariance = Array(0.6856, 0.0)  // including zero variance
      
            val testData = generateMultinomialLogisticInput(
              coefficients, xMean, xVariance, addIntercept = true, nPoints, seed)
      
            val df = sc.parallelize(testData, 4).toDF().withColumn("weight", lit(1.0))
            df.cache()
            df
          }
      

        Attachments

          Activity

            People

            • Assignee:
              WeichenXu123 Weichen Xu
              Reporter:
              WeichenXu123 Weichen Xu
              Shepherd:
              Joseph K. Bradley
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: