[SPARK-21086] CrossValidator, TrainValidationSplit should preserve all models after fitting - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: ML
Labels:
- bulk-closed

Description

I've heard multiple requests for having CrossValidatorModel and TrainValidationSplitModel preserve the full list of fitted models. This sounds very valuable.

One decision should be made before we do this: Should we save and load the models in ML persistence? That could blow up the size of a saved Pipeline if the models are large.

I suggest not saving the models by default but allowing saving if specified. We could specify whether to save the model as an extra Param for CrossValidatorModelWriter, but we would have to make sure to expose CrossValidatorModelWriter as a public API and modify the return type of CrossValidatorModel.write to be CrossValidatorModelWriter (but this will not be a breaking change).

Attachments

Sub-Tasks

1.	CrossValidator, TrainValidationSplit should collect all models when fitting: Scala API	Resolved	Weichen Xu
2.	CrossValidator, TrainValidationSplit should collect all models when fitting: Python API	Resolved	Weichen Xu
3.	CrossValidator, TrainValidationSplit dump sub models to disk when fitting: Scala API	Resolved	Unassigned
4.	CrossValidator, TrainValidationSplit dump sub models to disk when fitting: Python API	Resolved	Unassigned
5.	Add user guide entry for collecting sub models for cross-validation classes	Resolved	Unassigned

Activity

People

Assignee:: Unassigned

Reporter:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 14/Jun/17 01:53

Updated:: 21/May/19 04:14

Resolved:: 21/May/19 04:14