Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13568

Create feature transformer to impute missing values

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2.0
    • Component/s: ML
    • Labels:
      None

      Description

      It is quite common to encounter missing values in data sets. It would be useful to implement a Transformer that can impute missing data points, similar to e.g. Imputer in scikit-learn.

      Initially, options for imputation could include mean, median and most frequent, but we could add various other approaches. Where possible existing DataFrame code can be used (e.g. for approximate quantiles etc).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                yuhaoyan yuhao yang
                Reporter:
                mlnick Nick Pentreath
                Shepherd:
                Nick Pentreath
              • Votes:
                1 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: