Description
Feature attributes, e.g., continuous/categorical, feature names, feature dimension, number of categories, number of nonzeros (support) could be useful for ML algorithms.
In SPARK-3569, we added metadata to schema, which can be used to store feature attributes along with the dataset. We need to provide a wrapper over the Metadata class for ML usage.
The design doc is available at https://docs.google.com/document/d/1796XfSzFbZvGWFs0ky99AJhlqkOBRG1O2bUxK2N4Grk/edit?usp=sharing
Attachments
Attachments
Issue Links
- blocks
-
SPARK-5886 Add StringIndexer
- Resolved
- is depended upon by
-
SPARK-8515 Improve ML attribute API
- Resolved
- is related to
-
SPARK-12886 Expose params to control how feature names are generated
- Resolved
- links to