Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15041

adding mode strategy for ml.feature.Imputer for categorical features

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Incomplete
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: ML
    • Labels:

      Description

      Adding mode strategy for ml.feature.Imputer for categorical features. This need to wait until PR for SPARK-13568 gets merged.
      https://github.com/apache/spark/pull/11601

      From comments of jkbradley and Nick Pentreath in the PR

      Investigate efficiency of approaches using DataFrame/Dataset and/or approx approaches such as frequentItems or Count-Min Sketch (will require an update to CMS to return "heavy-hitters").
      investigate if we can use metadata to only allow mode for categorical features (or perhaps as an easier alternative, allow mode for only Int/Long columns)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                yuhaoyan yuhao yang
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: