Details
-
New Feature
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
None
-
None
Description
Adding mode strategy for ml.feature.Imputer for categorical features. This need to wait until PR for SPARK-13568 gets merged.
https://github.com/apache/spark/pull/11601
From comments of jkbradley and Nick Pentreath in the PR
Investigate efficiency of approaches using DataFrame/Dataset and/or approx approaches such as frequentItems or Count-Min Sketch (will require an update to CMS to return "heavy-hitters").
investigate if we can use metadata to only allow mode for categorical features (or perhaps as an easier alternative, allow mode for only Int/Long columns)
Attachments
Issue Links
- is blocked by
-
SPARK-13568 Create feature transformer to impute missing values
- Resolved