[IGNITE-12079] [ML][Umbrella] Add advanced preprocessing techniques - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: In Progress
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: ml
Labels:
None

Ignite Flags:

Docs Required, Release Notes Required

Description

Main goal:

To reduce the gap between Apache Spark and Apache Ignite in preprocessing operations. The reducing of the gap could help with loading Spark ML Pipelines to Ignite ML.

Next steps:

Add Frequency Encoder
Add two Imputing Strategies (MIN, MAX, COUNT, MOST_FREQUENT, LEAST_FREQUENT)
Add RobustScaler (will be added in Spark 3.0)
Add CountVectorizer
Add FeatureHasher
Add QuantileDiscretizer
Add Locality Sensitive Hashing (LSH)
Add LabelEncoder
Add RevertStringIndexing
Add multi-column preprocessor

Attachments

Sub-Tasks

[ML] Add Frequency Encoding

Resolved

Alexey Zinoviev

100%

[ML] Add support of the additional Imputing Strategies

Resolved

Alexey Zinoviev

100%

Activity

People

Assignee:: Alexey Zinoviev

Reporter:: Alexey Zinoviev

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 16/Aug/19 09:31

Updated:: 02/Dec/20 12:00

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

40m

Include sub-tasks