Details
-
Epic
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
All
-
[ML] Spot-ml code refactoring for better modularity
Description
The current spot-ml code is designed to run a pipeline that reads input data, trains a model and scores connections based on that model.
That design is great for batch processing of connections but it's a little inflexible when we try to do experiments using spark-shell or notebooks. Also, the current design won't work if spot-ml needs to be executed in a streaming fashion for near real time.
spot-ml should be able to:
- Train a model and save for future use or just train and return a model.
- Read an existing model and score connections.
- Word creation should be independent of training or read. Training and Scoring functionalities should receive a DataFrame with words already; that way during experimentation users can create their own words.
- As word creation, selected columns for modeling and filters should be proposed schemes and filters but users or contributors should be able to implement new ones.
Attachments
Issues in epic
|
SPOT-196 | [ML] Spot LDA Wrapper refactoring | Resolved | Ricardo Barona |
SPOT-195
[ML] Spot-ml code refactoring for better modularity
false
SPOT-195
[ML] Spot-ml code refactoring for better modularity