Description
This JIRA records behavior changes of ML/MLlib between 2.0 and 2.1, so we can note those changes (if any) in the user guide's Migration Guide section. If you found one, please comment below and link the corresponding JIRA here.
SPARK-17389: Reduce KMeans default k-means|| init steps to 2 from 5.SPARK-17870: ChiSquareSelector use pValue rather than raw statistic for SelectKBest features.SPARK-3261: KMeans returns potentially fewer than k cluster centers in cases where k distinct centroids aren't available or aren't selected.
Attachments
Issue Links
- contains
-
SPARK-17870 ML/MLLIB: ChiSquareSelector based on Statistics.chiSqTest(RDD) is wrong
- Resolved
-
SPARK-17748 One-pass algorithm for linear regression with L1 and elastic-net penalties
- Resolved
- relates to
-
SPARK-13448 Document MLlib behavior changes in Spark 2.0
- Resolved