[SPOT-195] [ML] Spot-ml code refactoring for better modularity - ASF JIRA

Add vote

Watch issue

XML

Word

Printable

JSON

Details

Type: Epic
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Labels:
None
Environment:
All

Epic Name:
[ML] Spot-ml code refactoring for better modularity

Description

The current spot-ml code is designed to run a pipeline that reads input data, trains a model and scores connections based on that model.
That design is great for batch processing of connections but it's a little inflexible when we try to do experiments using spark-shell or notebooks. Also, the current design won't work if spot-ml needs to be executed in a streaming fashion for near real time.

spot-ml should be able to:

Train a model and save for future use or just train and return a model.
Read an existing model and score connections.
Word creation should be independent of training or read. Training and Scoring functionalities should receive a DataFrame with words already; that way during experimentation users can create their own words.
As word creation, selected columns for modeling and filters should be proposed schemes and filters but users or contributors should be able to implement new ones.

Attachments

Issues in epic

SPOT-196

[ML] Spot LDA Wrapper refactoring

Resolved

Ricardo Barona

Activity

People

Assignee:: Ricardo Barona

Reporter:: Ricardo Barona

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Due:: 31/Jul/17

Created:: 06/Jul/17 17:58

Updated:: 10/Jul/17 19:10

Agile

View on Board

[ML] Spot-ml code refactoring for better modularity