Uploaded image for project: 'Spot (Retired)'
  1. Spot (Retired)
  2. SPOT-195

[ML] Spot-ml code refactoring for better modularity

Add voteWatch issue
    XMLWordPrintableJSON

Details

    • Epic
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • All
    • [ML] Spot-ml code refactoring for better modularity

    Description

      The current spot-ml code is designed to run a pipeline that reads input data, trains a model and scores connections based on that model.
      That design is great for batch processing of connections but it's a little inflexible when we try to do experiments using spark-shell or notebooks. Also, the current design won't work if spot-ml needs to be executed in a streaming fashion for near real time.

      spot-ml should be able to:

      • Train a model and save for future use or just train and return a model.
      • Read an existing model and score connections.
      • Word creation should be independent of training or read. Training and Scoring functionalities should receive a DataFrame with words already; that way during experimentation users can create their own words.
      • As word creation, selected columns for modeling and filters should be proposed schemes and filters but users or contributors should be able to implement new ones.

      Attachments

        Issues in epic

          Activity

            People

              rabarona Ricardo Barona
              rabarona Ricardo Barona

              Dates

                Created:
                Updated:

                Slack

                  Issue deployment