-
Type:
New Feature
-
Status: Resolved
-
Priority:
Minor
-
Resolution: Incomplete
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: ML
-
Labels:
Adding mode strategy for ml.feature.Imputer for categorical features. This need to wait until PR for SPARK-13568 gets merged.
https://github.com/apache/spark/pull/11601
From comments of jkbradley and Nick Pentreath in the PR
Investigate efficiency of approaches using DataFrame/Dataset and/or approx approaches such as frequentItems or Count-Min Sketch (will require an update to CMS to return "heavy-hitters").
investigate if we can use metadata to only allow mode for categorical features (or perhaps as an easier alternative, allow mode for only Int/Long columns)
- is blocked by
-
SPARK-13568 Create feature transformer to impute missing values
-
- Resolved
-