Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34080

Add UnivariateFeatureSelector to deprecate existing selectors

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Type: New Feature
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.1.1, 3.2.0
    • Fix Version/s: 3.1.1, 3.2.0
    • Component/s: ML
    • Labels:
    • Target Version/s:


      In SPARK-26111, we introduced a few univariate feature selectors, which share a common set of params. And they are named after the underlying test, which requires users to understand the test to find the matched scenarios. It would be nice if we introduce a single class called UnivariateFeatureSelector that accepts a selection criterion and a score method (string names). Then we can deprecate all other univariate selectors.

      For the params, instead of ask users to provide what score function to use, it is more friendly to ask users to specify the feature and label types (continuous or categorical) and we set a default score function for each combo. We can also detect the types from feature metadata if given. Advanced users can overwrite it (if there are multiple score function that is compatible with the feature type and label type combo). Example (param names are not finalized):

      selector = UnivariateFeatureSelector(featureCols=["x", "y", "z"], labelCol=["target"], featureType="categorical", labelType="continuous", select="bestK", k=100)

      cc: Huaxin Gao Ruifeng Zheng Weichen Xu




            • Assignee:
              huaxingao Huaxin Gao
              mengxr Xiangrui Meng


              • Created:

                Issue deployment