Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21086

CrossValidator, TrainValidationSplit should preserve all models after fitting

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.2.0
    • None
    • ML

    Description

      I've heard multiple requests for having CrossValidatorModel and TrainValidationSplitModel preserve the full list of fitted models. This sounds very valuable.

      One decision should be made before we do this: Should we save and load the models in ML persistence? That could blow up the size of a saved Pipeline if the models are large.

      • I suggest not saving the models by default but allowing saving if specified. We could specify whether to save the model as an extra Param for CrossValidatorModelWriter, but we would have to make sure to expose CrossValidatorModelWriter as a public API and modify the return type of CrossValidatorModel.write to be CrossValidatorModelWriter (but this will not be a breaking change).

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            josephkb Joseph K. Bradley
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment