[SPARK-38588] Validate input dataset of ml.classification - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Minor
Resolution: Resolved
Affects Version/s: 3.4.0
Fix Version/s: 3.4.0
Component/s: ML
Labels:
None

Description

LinearSVC should fail fast if the input dataset contains invalid values.

import org.apache.spark.ml.feature._
import org.apache.spark.ml.linalg._
import org.apache.spark.ml.classification._
import org.apache.spark.ml.clustering._
val df = sc.parallelize(Seq(LabeledPoint(1.0, Vectors.dense(1.0, Double.NaN)), LabeledPoint(0.0, Vectors.dense(Double.PositiveInfinity, 2.0)))).toDF()

val svc = new LinearSVC()
val model = svc.fit(df)

scala> model.intercept
res0: Double = NaN

scala> model.coefficients
res1: org.apache.spark.ml.linalg.Vector = [NaN,NaN]

Attachments

Issue Links

links to

[Github] Pull Request #35893 (zhengruifeng)

[Github] Pull Request #36026 (jackylee-ch)

Activity

People

Assignee:: Unassigned

Reporter:: Ruifeng Zheng

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 17/Mar/22 10:29

Updated:: 02/Apr/22 14:42

Resolved:: 24/Mar/22 08:05