Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13568

Create feature transformer to impute missing values

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.2.0
    • ML
    • None

    Description

      It is quite common to encounter missing values in data sets. It would be useful to implement a Transformer that can impute missing data points, similar to e.g. Imputer in scikit-learn.

      Initially, options for imputation could include mean, median and most frequent, but we could add various other approaches. Where possible existing DataFrame code can be used (e.g. for approximate quantiles etc).

      Attachments

        Issue Links

          Activity

            People

              yuhaoyan yuhao yang
              mlnick Nicholas Pentreath
              Nicholas Pentreath Nicholas Pentreath
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: