Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15041

adding mode strategy for ml.feature.Imputer for categorical features

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • None
    • None
    • ML

    Description

      Adding mode strategy for ml.feature.Imputer for categorical features. This need to wait until PR for SPARK-13568 gets merged.
      https://github.com/apache/spark/pull/11601

      From comments of jkbradley and Nick Pentreath in the PR

      Investigate efficiency of approaches using DataFrame/Dataset and/or approx approaches such as frequentItems or Count-Min Sketch (will require an update to CMS to return "heavy-hitters").
      investigate if we can use metadata to only allow mode for categorical features (or perhaps as an easier alternative, allow mode for only Int/Long columns)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              yuhaoyan yuhao yang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: